**Mikołaj Bojańczyk Alex Simpson (Eds.)**

# **Foundations of Software Science and Computation Structures**

**22nd International Conference, FOSSACS 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019 Prague, Czech Republic, April 6–11, 2019, Proceedings**

# Lecture Notes in Computer Science 11425

Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

#### Editorial Board Members

David Hutchison, UK Josef Kittler, UK Friedemann Mattern, Switzerland Moni Naor, Israel Bernhard Steffen, Germany Doug Tygar, USA

Takeo Kanade, USA Jon M. Kleinberg, USA John C. Mitchell, USA C. Pandu Rangan, India Demetri Terzopoulos, USA

# Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen, University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at http://www.springer.com/series/7407

# Foundations of Software Science and Computation Structures

22nd International Conference, FOSSACS 2019 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019 Prague, Czech Republic, April 6–11, 2019 Proceedings

Editors Mikołaj Bojańczyk University of Warsaw Warsaw, Poland

Alex Simpson University of Ljubljana Ljubljana, Slovenia

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-17126-1 ISBN 978-3-030-17127-8 (eBook) https://doi.org/10.1007/978-3-030-17127-8

Library of Congress Control Number: 2019936298

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

# ETAPS Foreword

Welcome to the 22nd ETAPS! This is the first time that ETAPS took place in the Czech Republic in its beautiful capital Prague.

ETAPS 2019 was the 22nd instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations to programming language developments, analysis tools, formal approaches to software engineering, and security.

Organizing these conferences in a coherent, highly synchronized conference program enables participation in an exciting event, offering the possibility to meet many researchers working in different directions in the field and to easily attend talks of different conferences. ETAPS 2019 featured a new program item: the Mentoring Workshop. This workshop is intended to help students early in the program with advice on research, career, and life in the fields of computing that are covered by the ETAPS conference. On the weekend before the main conference, numerous satellite workshops took place and attracted many researchers from all over the globe.

ETAPS 2019 received 436 submissions in total, 137 of which were accepted, yielding an overall acceptance rate of 31.4%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2019 featured the unifying invited speakers Marsha Chechik (University of Toronto) and Kathleen Fisher (Tufts University) and the conference-specific invited speakers (FoSSaCS) Thomas Colcombet (IRIF, France) and (TACAS) Cormac Flanagan (University of California at Santa Cruz). Invited tutorials were provided by Dirk Beyer (Ludwig Maximilian University) on software verification and Cesare Tinelli (University of Iowa) on SMT and its applications. On behalf of the ETAPS 2019 attendants, I thank all the speakers for their inspiring and interesting talks!

ETAPS 2019 took place in Prague, Czech Republic, and was organized by Charles University. Charles University was founded in 1348 and was the first university in Central Europe. It currently hosts more than 50,000 students. ETAPS 2019 was further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Jan Vitek and Jan Kofron (general chairs), Barbora Buhnova, Milan Ceska, Ryan Culpepper, Vojtech Horky, Paley Li, Petr Maj, Artem Pelenitsyn, and David Safranek.

The ETAPS SC consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Gilles Barthe (Madrid), Holger Hermanns (Saarbrücken), Joost-Pieter Katoen (chair, Aachen and Twente), Gerald Lüttgen (Bamberg), Vladimiro Sassone (Southampton), Tarmo Uustalu (Reykjavik and Tallinn), and Lenore Zuck (Chicago). Other members of the SC are: Wil van der Aalst (Aachen), Dirk Beyer (Munich), Mikolaj Bojanczyk (Warsaw), Armin Biere (Linz), Luis Caires (Lisbon), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), Jurriaan Hage (Utrecht), Rainer Hähnle (Darmstadt), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), Barbara König (Duisburg), Kim G. Larsen (Aalborg), Matteo Maffei (Vienna), Tiziana Margaria (Limerick), Peter Müller (Zurich), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Dave Sands (Gothenburg), Don Sannella (Edinburgh), Alex Simpson (Ljubljana), Gabriele Taentzer (Marburg), Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas Vojnar (Brno), Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Lijun Zhang (Beijing).

I would like to take this opportunity to thank all speakers, attendants, organizers of the satellite workshops, and Springer for their support. I hope you all enjoy the proceedings of ETAPS 2019. Finally, a big thanks to Jan and Jan and their local organization team for all their enormous efforts enabling a fantastic ETAPS in Prague!

February 2019 Joost-Pieter Katoen ETAPS SC Chair ETAPS e.V. President

# Preface

This volume contains the papers presented at the 22nd International Conference on Foundations of Software Science and Computation Structures (FoSSaCS), which took place in Prague during April 8–11, 2019. The conference is dedicated to foundational research with a clear significance for software science. It brings together research on theories and methods to support the analysis, integration, synthesis, transformation, and verification of programs and software systems.

The volume contains 29 contributed papers selected from 85 full paper submissions, and also a paper accompanying an invited talk by Thomas Colcombet (IRIF, France). Each submission was reviewed by at least three Program Committee members, with the help of external reviewers, and the final decisions took into account the feedback from a rebuttal phase. The conference submissions were managed using the EasyChair system, which was also used to assist with the compilation of the proceedings.

We wish to thank all the authors who submitted to FoSSaCS 2019, the Program Committee members, and the external reviewers. In addition, we would like to thank the ETAPS organization for providing an excellent environment for FoSSaCS alongside the other ETAPS conferences and workshops.

February 2019 Mikołaj Bojańczyk Alex Simpson

# Organization

#### Program Committee

Mikołaj Bojańczyk University of Warsaw Dexter Kozen Cornell University, USA Orna Kupferman Hebrew University, Israel Angelo Montanari University of Udine, Italy James Worrell University of Oxford, UK

Luca Aceto Reykjavik University, Iceland Achim Blumensath Masaryk University, Brno, Czech Republic Agata Ciabattoni Vienna University of Technology, Austria Flavio Corradini University of Camerino, Italy Nathanaël Fijalkow CNRS, LaBRI, University of Bordeaux, France Sergey Goncharov FAU Erlangen-Nürnberg, Germany Matthew Hague Royal Holloway University of London, UK Chris Heunen The University of Edinburgh, UK Patricia Johann Appalachian State University, USA Bartek Klin University of Warsaw, Poland Naoki Kobayashi The University of Tokyo, Japan Paul Blain Levy University of Birmingham, UK Peter Lefanu Lumsdaine Stockholm University, Sweden Radu Mardare Aalborg University, Denmark Anca Muscholl LaBRI, University of Bordeaux, France Rasmus Ejlers Møgelberg IT University of Copenhagen, Denmark K. Narayan Kumar Chennai Mathematical Institute, India Dirk Pattinson The Australian National University, Australia Daniela Petrisan Université Paris Diderot - Paris 7, France Davide Sangiorgi University of Bologna, Italy Alex Simpson University of Ljubljana, Slovenia Ana Sokolova University of Salzburg, Austria

#### Additional Reviewers

Achilleos, Antonis Ahn, Ki Yung Ahrens, Benedikt Andres Martinez, Pablo Atig, Mohamed Faouzi Atkey, Robert Bacci, Giorgio Bacci, Giovanni

Bahr, Patrick Bartocci, Ezio Basold, Henning Becker, Ruben Benerecetti, Massimo Bernardi, Giovanni Blahoudek, František Blondin, Michael

Bonchi, Filippo Bresolin, Davide Bruyère, Véronique Cacciagrano, Diletta Romana Cassar, Ian Cerna, David Chakraborty, Soham Chen, Xiaohong Clouston, Ranald Dal Lago, Ugo de Frutos Escrig, David de Paiva, Valeria Degorre, Aldric Della Monica, Dario Din, Crystal Chang Dougherty, Daniel Doumane, Amina Dubut, Jérémy Emmi, Michael Enrique Moliner, Pau Escardo, Martin Faella, Marco Ferreira, Carla Furber, Robert Fábregas, Ignacio Gadducci, Fabio Galesi, Nicola García-Pérez, Álvaro Gastin, Paul Gavazzo, Francesco Gorogiannis, Nikos Goubault-Larrecq, Jean Grädel, Erich Haar, Stefan Hamana, Makoto Haselwarter, Philipp Hasuo, Ichiro Hausmann, Daniel Heindel, Tobias Herbreteau, Frédéric Hoshino, Naohiko Hosseini, Mehran Hunt, Seb Hyvernat, Pierre Jaber, Guilhem Jacq, Clément

Johnsen, Einar Broch Kaarsgaard, Robin Kaminski, Benjamin Lucien Kammar, Ohad Karvonen, Martti Katsumata, Shin-Ya Kerjean, Marie Kop, Cynthia Kurz, Alexander Kuznets, Roman Kučera, Antonín Laird, James Lefaucheux, Engel Leitsch, Alexander Leroux, Jérôme Lhote, Nathan Lindley, Sam Loreti, Michele Mamouras, Konstantinos Marsden, Dan Masini, Andrea Mazowiecki, Filip Mazza, Damiano Mellies, Paul-Andre Melliès, Paul-André Merelli, Emanuela Mostarda, Leonardo Mukund, Madhavan Neves, Renato Norman, Gethin North, Paige Ohlmann, Pierre Olarte, Carlos Oortwijn, Wytse Otop, Jan Paquet, Hugo Pedersen, Mathias Ruggaard Perez, Guillermo Peron, Adriano Petrov, Tatjana Pédrot, Pierre-Marie Pérez, Jorge A. Quaas, Karin Ramanujam, R. Rampersad, Narad Rauch, Christoph

Re, Barbara Rehak, Vojtech Sala, Pietro Schoepp, Ulrich Schrijvers, Tom Schröder, Lutz Schwoon, Stefan Sin'Ya, Ryoma Sobocinski, Pawel Sojakova, Kristina Staton, Sam Sumii, Eijiro Sutre, Grégoire Tang, Qiyi Tesei, Luca Thinnayam, Ramanathan Tiezzi, Francesco

Tschaikowski, Max Tsukada, Takeshi Turrini, Andrea Unno, Hiroshi Uustalu, Tarmo van Dijk, Tom van Heerdt, Gerco Vicary, Jamie Vidal, German Vignudelli, Valeria Voigtländer, Janis Wallbridge, James Weil, Pascal Winskel, Glynn Wojtczak, Dominik Wolter, Uwe Ziemiański, Krzysztof

# Contents





# **Universal Graphs and Good for Games Automata: New Tools for Infinite Duration Games**

Thomas Colcombet1(B) and Nathana¨el Fijalkow<sup>2</sup>

<sup>1</sup> CNRS, IRIF, Universit´e Paris-Diderot, Paris, France thomas.colcombet@irif.fr

<sup>2</sup> CNRS, LaBRI, Universit´e de Bordeaux, Bordeaux, France

**Abstract.** In this paper, we give a self contained presentation of a recent breakthrough in the theory of infinite duration games: the existence of a quasipolynomial time algorithm for solving parity games. We introduce for this purpose two new notions: good for small games automata and universal graphs.

The first object, good for small games automata, induces a generic algorithm for solving games by reduction to safety games. We show that it is in a strong sense equivalent to the second object, universal graphs, which is a combinatorial notion easier to reason with. Our equivalence result is very generic in that it holds for all existential memoryless winning conditions, not only for parity conditions.

### **1 Introduction**

In this abstract, we are interested in the complexity of deciding the winner of finite turn-based perfect-information antagonistic two-player games. So typically, we are interested in parity games, or mean-payoff games, or Rabin games, etc. . .

In particular we revisit the recent advances showing that deciding the winner of parity games can be done in quasipolynomial time. Whether parity games can be solved in polynomial time is the main open question in this research area, and an efficient algorithm would have far-reaching consequences in verification, synthesis, logic, and optimisation. From a complexity-theoretic point of view, this is an intriguing puzzle: the decision problem is in **NP** and in **coNP**, implying that it is very unlikely to be **NP**-complete (otherwise **NP** = **coNP**). Yet no polynomial time algorithm has yet been constructed. For decades the best algorithms were exponential or mildly subexponential, most of them of the form n<sup>O</sup>(d) , where n is the number of vertices and d the number of priorities (we refer to Section 2 for the role of these parameters).

Recently, Calude, Jain, Khoussainov, Li, and Stephan [CJK+17] constructed a quasipolynomial time algorithm for solving parity games, of complexity

This work was supported by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 670624), and by the DeLTA ANR project (ANR-16-CE40-0007).

nO(log <sup>d</sup>) . Two subsequent algorithms with similar complexity were constructed by Jurdzi´nski and Lazi´c [JL17], and by Lehtinen [Leh18].

Our aim in this paper is to understand these results through the prism of good for small games automata, which are used to construct generic reductions to solving safety games. A good for small games automaton can be understood as an approximation of the original winning condition which is correct for small games. The size of good for small games automata being critical in the complexity of these algorithms, we aim at understanding this parameter better.

A concrete instanciation of good for small games automata is the notion of separating automata, which was introduced by Boja´nczyk and Czerwi´nski [BC18] to reformulate the first quasipolynomial time algorithm of [CJK+17]. Later Czerwi´nski, Daviaud, Fijalkow, Jurdzi´nski, Lazi´c, and Parys [CDF+19] showed that the other two quasipolynomial time algorithms also can be understood as the construction of separating automata, and proved a quasipolynomial lower bound on the size of separating automata.

In this paper, we establish in particular Theorem 9 which states an equivalence between the size of good for small games automata, nondeterministic of separating automata, of deterministic separating automata and of universal graphs. This statement is generic in the sense that it holds for any winning condition which is memoryless for the existential player, hence in particular for parity conditions. At a technical level, the key notion that we introduce to show this equivalence is the combinatorial concept of universal graphs.

Our second contribution, Theorem 10, holds for the parity condition only, and is a new equivalence between universal trees and universal graphs. In particular we use a technique of saturation of graphs which simplifies greatly the arguments. The two theorems together give an alternative simpler proof of the result in [CDF+19].

Let us mention that the equivalence results have been very recently used to construct algorithms for mean-payoff games, leading to improvements over the best known algorithm [FGO18].

*Structure of the paper* In Section 2 we introduce the classical notions of games, automata, and good for games automata. In Section 3, we introduce the notion of good for small games automata, and show that in the context of memoryless for the existential player winning conditions these automata can be characterised in different ways, using in particular universal graphs (Theorem 9). In Section 4, we study more precisely the case of parity conditions.

#### **2 Games and automata**

We describe in this subsection classical material: arenas, games, strategies, automata and good for games automata. Section 2.1 introduces games, Section 2.2 the concept of memoryless strategy, and Section 2.3 the class of automata we use. Finally, Section 2.4 explains how automata can be used for solving games, and in particular defines the notion of automata that are good for games.

#### **2.1 Games**

We will consider several forms of graphs, which are all directed labelled graph with a root vertex. Let us fix the terminology now. Given a set X, an X-graph <sup>H</sup> = (<sup>V</sup> , E,rootH) has a set of vertices <sup>V</sup> , a set of <sup>X</sup>-labelled edges <sup>E</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup> <sup>X</sup> <sup>×</sup> <sup>V</sup> , and a root vertex rootH. We write <sup>x</sup> u −→<sup>H</sup> <sup>y</sup> if there exists a path from vertex <sup>x</sup> to vertex <sup>y</sup> labelled by the word <sup>u</sup> <sup>∈</sup> <sup>X</sup>∗. We write <sup>x</sup> u −→<sup>H</sup> ∞ if there exists an infinite path starting in vertex <sup>x</sup> labelled by the word <sup>u</sup> <sup>∈</sup> <sup>X</sup>ω. The graph is trimmed if all vertices are reachable from the root and have outdegree at least one. Note that as soon as a graph contains some infinite path starting from the root, it can be made trimmed by removing the bad vertices. A morphism of X-graphs from G to H is a map α from vertices of G to vertices of H, that sends the root of G to the root of H, and sends each edge of G to an edge of <sup>H</sup>, i.e., for all <sup>a</sup> <sup>∈</sup> <sup>X</sup>, <sup>p</sup> <sup>a</sup>−→<sup>G</sup> <sup>q</sup> implies <sup>α</sup>(p) <sup>a</sup>−→<sup>H</sup> <sup>α</sup>(q). A weak morphism of X-graphs is like a morphism but we lift the property that the root of <sup>G</sup> is sent to the root of <sup>H</sup> and instead require that if root <sup>a</sup>−→<sup>G</sup> <sup>x</sup> then root <sup>a</sup> −→<sup>H</sup> <sup>α</sup>(x).

**Definition 1.** *Let* C *be a set (of colors). A* C*-arena* A *is a* C*-graph in which vertices are split into* <sup>V</sup> <sup>=</sup> <sup>V</sup><sup>E</sup> <sup>V</sup>A*. The vertices are called positions. The positions in* V<sup>E</sup> *are the positions owned by the existential player, and the ones in* V<sup>A</sup> *are owned by the universal player. The root is the initial position. The edges are called moves. Infinite paths starting in the initial position are called plays. Finite paths starting in the initial position are called partial plays. The dual of an arena is obtained by swapping* V<sup>A</sup> *and* VE*, i.e., exchanging the ownernship of the positions.*

*<sup>A</sup>* <sup>W</sup>*-game* <sup>G</sup> = (A,W) *consists of a* <sup>C</sup>*-arena* <sup>A</sup> *together with a set* <sup>W</sup> <sup>⊆</sup> <sup>C</sup><sup>ω</sup> *called the winning condition.*

*For simplicity, we assume in this paper the following epsilon property*1*: there is a special color* <sup>ε</sup> <sup>∈</sup> <sup>C</sup> *such that for all words* u, <sup>v</sup> <sup>∈</sup> <sup>C</sup><sup>ω</sup>*, if* <sup>u</sup> *and* <sup>v</sup> *are equal after removing all the* <sup>ε</sup>*-letters, then* <sup>u</sup> <sup>∈</sup> <sup>W</sup> *if and only if* <sup>v</sup> <sup>∈</sup> <sup>W</sup>*.*

*The dual of a game is obtained by dualising the arena, and complementing the winning condition.*

If one compares with usual games – for instance checkers – then the arena represents the set of possible board configurations of the game (typically, the configuration of the board plus a bit telling whose turn to play it is). The configuration is an existential position if it is the first player's turn to play, otherwise it is a universal position. There is an edge from u to v if it is a valid move for the player to go from configuration u to configuration v. The interest of having

<sup>1</sup> This assumption is satisfied in an obvious way for all winning conditions seen in this paper. It could be avoided, but at the technical price of considering slightly different forms of games: games in which the moves are positive boolean combinations of pairs of colors and positions. Such 'move relations' form a joint generalisation of existential positions (which can be understood as logical disjunction) and universal position (which can be understood as logical conjunction).

colors and winning conditions may not appear clearly in this context, but the intent would be, for example, to tell who is the winner if the play is infinite.

Informally, the game is played as follows by two players: the existential player and the universal player<sup>2</sup>. At the beginning, a token is placed at the initial position of the game. Then the game proceeds in rounds. At each round, if the token is on an existential position then it is the existential player's turn to play, otherwise it is the universal player's turn. This player chooses an outgoing move from the position, and the token is pushed along this move. This interaction continues forever, inducing a play (defined as an infinite path in the arena) labelled by an infinite sequence of colors. If this infinite sequence belongs to the winning condition W, then the existential player wins the play, otherwise, the universal player wins the play. It may happen that a player has to play but there is no move available from the current position: in this case the player immediately loses.

*Classical winning conditions* Before describing more precisely the semantics of games, let us recall what are the classical winning conditions considered in this context.

#### **Definition 2.** *We define the following classical winning conditions:*


<sup>2</sup> In the literature, the players have many other names: 'Eve' and 'Adam', 'Eloise' and 'Abelard', 'Exist' and 'Forall', '0' and '1', or in specific contexts: 'Even' and 'Odd', 'Automaton' and 'Pathfinder', 'Duplicator' and 'Spoiler', . . .

**mean-payoff condition** *Given a finite set* <sup>C</sup> <sup>⊆</sup> <sup>R</sup>*, a word* <sup>u</sup> <sup>=</sup> <sup>c</sup>1c2c<sup>3</sup> ···∈ <sup>C</sup><sup>ω</sup> *belongs to* meanpayoffC *if*

$$\liminf\_{n \to \infty} \frac{c\_1 + c\_2 + \dots + c\_n}{n} \geqslant 0 \text{ .}$$

*There are many variants of this definition (such as replacing* lim inf *with* lim sup*), that all turn out to be equivalent on finite arenas.*

*Strategies* We describe now formally what it means to win a game. Let us take the point of view of the existential player. A strategy for the existential player is an object that describes how to play in every situation of the game that could be reached. It is a winning strategy if whenever these choices are respected during a play, the existential player wins this play. There are several ways one can define the notion of a strategy. Here we choose to describe a strategy as the set of partial plays that may be produced when it is used.

**Definition 3.** *A strategy* s *for the existential player* s<sup>E</sup> *is a set of partial plays of the game that has the following properties:*


*A play is compatible with the strategy* s<sup>E</sup> *if all its finite prefixes belong to* s*. A play is winning if it belongs to the winning condition* W*. A game is won by the existential player if there exists a strategy for the existential player such that all plays compatible with it are won by the existential player. Such a strategy is called a winning strategy.*

*Symmetrically, a (winning) strategy for the universal player is a (winning) strategy for the existential player in the dual game. A game is won by the universal player if there exists a strategy for the universal player such that all infinite plays compatible with it are won by the universal player.*

The idea behind this definition is that at any moment in the game, when following a strategy, a sequence of moves has already been played, yielding a partial play in the arena. The above definition guarantees that: 1. if a partial play belongs to the strategy, it is indeed reachable by a succession of moves that stay in the strategy, 2. if, while following the strategy, a partial play ends in a vertex owned by the existential player, there exists exactly one move that can be followed by the strategy at that moment, and 3. if, while following the strategy, a partial play ends in a vertex owned by the universal player, the strategy is able to face all possible choices of the opponent.

*Remark 1.* It is not possible that in a strategy defined in this way one reaches an existential position that would have no successor: indeed, 2. would not hold. *Remark 2.* There are different ways to define a strategy in the literature. One is as a strategy tree: indeed one can see s<sup>E</sup> as a set of nodes equipped with prefix ordering as the ancestor relation. Another way is to define a strategy as a partial map from paths to moves. All these definitions are equivalent. The literature also considers randomized strategies (in which the next move is chosen following a probability distribution): this is essential when the games are *concurrent* or *with partial information*, but not in the situation we consider in this paper.

**Lemma 1 (at most one player wins).** *It is not possible that both the existential player and the universal player win the same game.*

Of course, keeping the intuition of games in mind, one would expect also that one of the player wins. However, this is not necessarily the case. A game is called determined if either the existential or the universal player wins the game. The fact that a game is determined is referred to as its determinacy. A winning condition W is determined if all W-games are determined. It happens that not all games are determined.

**Theorem 1.** *There exist winning conditions that are not determined (and it requires the axiom of choice to prove it).*

However, there are some situations in which games are determined. This is the case of finite duration games, of safety games, and more generally:

**Theorem 2 (Martin's theorem of Borel determinacy [Mar75]).** *Games with Borel winning conditions are determined.*

Defining the notion of Borel sets is beyond the scope of this paper. It suffices to know that this notion is sufficiently powerful for capturing a lot of natural winning conditions, and in particular all winning conditions in this paper are Borel; and thus determined.

#### **2.2 Memory of strategies**

A key insight in understanding a winning condition is to study the amount of memory required by winning strategies. To define the notion of memoryless strategies, we use an equivalent point of view on strategies, using strategy graphs.

**Definition 4.** *Given a* C*-arena* A*, an existential player strategy graph* SE, γ *in* A *is a trimmed* C*-graph* S<sup>E</sup> *together with a graph morphism* γ *from* S<sup>E</sup> *to* A *such that for all vertices* x *in* SE*,*


*The existential player strategy graph* SE, γ *is memoryless if* γ *is injective. In general the memory of the strategy is the maximal cardinality of* γ−<sup>1</sup>(v) *for* v *ranging over all positions in the arena. For* <sup>G</sup> *<sup>a</sup>* <sup>W</sup>*-game with* <sup>W</sup> <sup>⊆</sup> <sup>C</sup>ω*, an existential player strategy graph* S<sup>E</sup> *is winning if the labels of all its paths issued from the root belong to* W*.*

*The (winning) universal player strategy graphs are defined as the (winning) existential player strategy graphs in the dual game.*

*The winning condition* W *is memoryless for the existential player if, whenever the existential player wins in a* W*-game, there is a memoryless winning existential player strategy graph. It is memoryless for the existential player over finite arenas if this holds for finite* W*-games only. The dual notion is the one of memoryless for the universal player winning condition.*

Of course, as far as existence is concerned the two notions of strategy coincide:

**Lemma 2.** *There exists a winning existential player strategy graph if and only if there exists a winning strategy for the existential player.*

*Proof.* A strategy for the existential player s<sup>E</sup> can be seen as a C-graph (in fact a tree) <sup>S</sup><sup>E</sup> of vertices <sup>s</sup>E, of root <sup>ε</sup>, and with edges of the form (π, a, πa) for all πa <sup>∈</sup> sE. If the strategy s<sup>E</sup> is winning, then the strategy graph S<sup>E</sup> is also winning. Conversely, given an existential player strategy graph SE, the set s<sup>E</sup> of its paths starting from the root is itself a strategy for the existential player. Again, the winning property is preserved.

We list a number of important results stating that some winning conditions do not require memory.

**Theorem 3 ([EJ91]).** *The parity condition is memoryless for the existential player and for the universal player.*

**Theorem 4 ([EM79,GKK88]).** *The mean-payoff condition is memoryless for the existential player over finite arenas as well as for the universal player.*

**Theorem 5 ([GH82]).** *The Rabin condition is memoryless for the existential player, but not in general for the universal player.*

**Theorem 6 ([McN93]).** *Muller conditions are finite-memory for both players.*

**Theorem 7 ([CFH14]).** *Topologically closed conditions for which the residuals are totally ordered by inclusion are memoryless for the existential player.*

#### **2.3 Automata**

**Definition 5 (automata over infinite words).** *Let* <sup>W</sup> <sup>⊆</sup> <sup>C</sup><sup>ω</sup>*. A (nondeterministic)* <sup>W</sup>*-automaton* <sup>A</sup> *over the alphabet* <sup>A</sup> *is a* (<sup>C</sup> <sup>×</sup> <sup>A</sup>)*-graph. The convention is to call states its vertices, and transitions its edges. The root vertex is called the initial state. The set* W *is called the accepting condition (whereas it* *is the winning condition for games). The automaton* A<sup>p</sup> *is obtained from* A *by setting the state* p *to be initial.*

*<sup>A</sup> run of the automaton* <sup>A</sup> *over* <sup>u</sup> <sup>∈</sup> <sup>A</sup><sup>ω</sup> *is an infinite path in* <sup>A</sup> *that starts in the initial state and projects on its* A*-component to* u*. A run is accepting if it projects on its* <sup>C</sup>*-component to a word* <sup>v</sup> <sup>∈</sup> <sup>W</sup>*. The language accepted by* <sup>A</sup> *is the set* <sup>L</sup>(A) *of infinite words* <sup>u</sup> <sup>∈</sup> <sup>A</sup><sup>ω</sup> *such that there exists an accepting run of* <sup>A</sup> *on* <sup>u</sup>*.*

*An automaton is deterministic (resp. complete) if for all states* p *and all letters* <sup>a</sup> <sup>∈</sup> <sup>A</sup>*, there exists at most one (resp. at least one) transition of the form* (p,(a, c), q)*. If the winning condition is parity, this is a parity automaton. If the winning condition is safety, this is a safety automaton, and we do not mention the* C*-component since there is only one color. I.e., the transitions form a subset of* <sup>Q</sup> <sup>×</sup> <sup>A</sup> <sup>×</sup> <sup>Q</sup>*, and the notion coincides with the one of a* <sup>A</sup>*-graph. For this reason, we may refer to the language* <sup>L</sup>(H) *accepted by an* <sup>A</sup>*-graph* <sup>H</sup>*: this is the set of labelling words of infinite paths starting in the root vertex of* H*.*

Note that here we use non-deterministic automata for simplicity. However, the notions developed in this paper can be adapted to alternating automata.

*The notion of* ω*-regularity.* It is not the purpose of this paper to describe the rich theory of automata over infinite words. It suffices to say that a robust concept of ω-regular language emerges. These are the languages that are equivalently defined by means of B¨uchi automata, parity automata, Rabin automata, Muller automata, deterministic parity automata, deterministic Rabin automata, deterministic Muller automata, as well as many other formalisms (regular expressions, monadic second-order logic, ω-semigroup, alternating automata, . . . ). However, safety automata and deterministic B¨uchi automata define a subclass of ω-regular languages.

Note that the mean-payoff condition does not fall in this category, and automata defined with this condition do not recognize ω-regular languages in general.

#### **2.4 Automata for solving games**

There is a long tradition of using automata for solving games. The general principle is to use automata as reductions, i.e. starting from a <sup>V</sup>-game <sup>G</sup> and a <sup>W</sup>-automaton <sup>A</sup> that accepts the language <sup>V</sup>, we construct a <sup>W</sup>-game G×A called the product game that combines the two, and which is expected to have the same winner: this means that to solve the <sup>V</sup>-game <sup>G</sup>, it is enough to solve the <sup>W</sup>-game G×A. We shall see below that, unfortunately, this expected property does not always hold (Remark 4). The automata that guarantee the correction of the construction are called good for games, originally introduced by Henzinger and Piterman [HP06].

We begin our description by making precise the notion of product game. Informally, the new game requires the players to play like in the original game, and after each step, the existential player is required to provide a transition in the automaton that carries the same label.

**Definition 6.** *Let* <sup>D</sup> *be an arena over colors* <sup>C</sup>*, with positions* <sup>P</sup> *and moves* <sup>M</sup>*. Let also* <sup>A</sup> *be a* <sup>W</sup>*-automaton over the alphabet* <sup>C</sup> *with states* <sup>Q</sup> *and transitions* Δ*. We construct the product arena* D × A *as follows:*


*Note that every game move* ((x, <sup>p</sup>), ε,((x, c, <sup>y</sup>), <sup>p</sup>)) *of* G×A *can be transformed into a move* (x, c, <sup>y</sup>) *of* <sup>G</sup>*, called its game projection. Similarly every automaton move* (((x, c, y), p), d,(y, q)) *can be turned into a transition* (p,(c, <sup>d</sup>), <sup>q</sup>) *of the automaton* <sup>A</sup> *called its automaton projection. Hence, every play* <sup>π</sup> *of the product game can be projected into the pair of a play* <sup>π</sup> *in* <sup>G</sup> *of label* u *(called the game projection), and an infinite run* ρ *of the automaton over* u *(called the automaton projection). The product game is the game over the product arena, using the winning condition of the automaton.*

**Lemma 3 (folklore**<sup>3</sup>**).** *Let* <sup>G</sup> *be a* <sup>V</sup>*-game, and* <sup>A</sup> *be a* <sup>W</sup>*-automaton that accepts a language* <sup>L</sup> <sup>⊆</sup> <sup>V</sup>*, then if the existential player wins* G × <sup>Q</sup>A*, she wins* G*.*

*Proof.* Assume that the existential player wins the game G×A using a strategy sE. This strategy can be turned into a strategy for the existential player s <sup>E</sup> in G by performing a game projection. It is routine to check that this is a valid strategy.

Let us show that this strategy s <sup>E</sup> is V-winning, and hence conclude that the existential player wins the game <sup>G</sup>. Indeed, let <sup>π</sup> be a play compatible with <sup>s</sup> E, say labelled by u. This play π has been obtained by game projection of a play π compatible with <sup>s</sup><sup>E</sup> in G×A. The automaton projection <sup>ρ</sup> of <sup>π</sup> is a run of <sup>A</sup> over <sup>u</sup>, and is accepting since <sup>s</sup><sup>E</sup> is a winning strategy. Hence, <sup>u</sup> is accepted by <sup>A</sup> and as a consequence belongs to V. We have proved that s<sup>E</sup> is winning.

**Corollary 1.** *Let* <sup>G</sup> *be a* <sup>V</sup>*-game, and* <sup>A</sup> *be a deterministic* <sup>W</sup>*-automaton that accepts the language* <sup>V</sup>*, then the games* <sup>G</sup> *and* G×A *have the same winner.*

*Proof.* We assume without loss of generality that A is deterministic and complete (note that this may require to slightly change the accepting condition, for instance in the case of safety). The results then follows from the application of Lemma 3 to the game G and its dual.

<sup>3</sup> This technique of reduction is in fact more general, since the automaton may not be a safety automaton. Its use can be traced back, for instance, to the work of B¨uchi and Landweber [BL69].

The consequence of the above lemma is that when we know how to solve <sup>W</sup>-games, and we have a deterministic <sup>W</sup>-automaton <sup>A</sup> for a language <sup>V</sup>, then we can decide the winner of V-games by performing the product of the game with the automaton, and deciding the winner of the resulting game. Good for games automata are automata that need not be deterministic, but for which this kind of arguments still works.

**Definition 7 (good for games automata [HP06]).** *Let* V *be a language, and* <sup>A</sup> *be a* <sup>W</sup>*-automaton. Then* <sup>A</sup> *is good for* <sup>V</sup>*-games if for all* <sup>V</sup>*-games* <sup>G</sup>*,* <sup>G</sup> *and* G×A *have the same winner.*

Note that Lemma 1 says that deterministic automata are good for games automata.

*Remark 3.* It may seem strange, a priori, not to require in the definition that <sup>L</sup>(A) = <sup>V</sup>. In fact, it holds anyway: if an automaton is good for <sup>V</sup>-games, then it accepts the language <sup>V</sup>. Indeed, let us assume that there exists a word <sup>u</sup> <sup>∈</sup> <sup>L</sup>(A) \ <sup>V</sup>, then one can construct a game that has exactly one play, labelled <sup>u</sup>. This game is won by the universal player since <sup>u</sup> ∈ <sup>V</sup>, but the existential player wins G×A. A contradiction. The same argument works if there is a word in <sup>V</sup> \ <sup>L</sup>(A).

Examples of good for games automata can be found in [BKS17], together with a structural analysis of the extent to which they are non-deterministic.

*Remark 4.* We construct an automaton which is not good for games. The alphabet is {a, b}. The automaton <sup>A</sup> is a B¨uchi automaton: it has an initial state from which goes two -transitions: the first transition guesses that the word contains infinitely many a's, and the second transition guesses that the word contains infinitely many b's. Note that any infinite word contains either infinitely many a's or infinitely many b's, so the language V recognised by this automaton is the set of all words. However, this automaton requires a choice to be made at the very first step about which of the two alternatives hold. This makes it not good for games: indeed, consider a game G where the universal player picks any infinite word, letter by letter, and the winning condition is V. It has only one position owned by the universal player. The existential player wins G because all plays are winning. However, the existential player loses G×A, because in this game she has to declare at the first step whether there will be infinitely many a's or infinitely many b's, which the universal player can later contradict.

Let us conclude this part with Lemma 4, stating the possibility to compose good for games automata. We need before hand to defined the composition of automata.

Given <sup>A</sup> <sup>×</sup> <sup>B</sup>-graph <sup>A</sup>, and <sup>B</sup> <sup>×</sup> <sup>C</sup>-graph <sup>B</sup>, the composed graph <sup>B</sup> ◦ <sup>A</sup> has as states the product of the sets of states, as initial state the ordered pair of the initial states, and there is a transition ((p, q),(a, c),(p , q )) if there is a transition (p,(a, b), p ) in <sup>A</sup> and a transition (q,(b, <sup>c</sup>), <sup>q</sup> ). If A is in fact an automaton that uses the accepting condition <sup>V</sup>, and <sup>B</sup> an automaton that uses the accepting condition <sup>W</sup>, then the composed automaton <sup>B</sup> ◦ <sup>A</sup> uses has as underlying graph the composed graphs, and as accepting condition W.

**Lemma 4 (composition of good for games automata).** *Let* A *be a good for games* <sup>W</sup>*-automaton for the language* <sup>V</sup>*, and* <sup>B</sup> *be good for games* <sup>V</sup>*-automaton for the language* <sup>L</sup>*, then the composed automaton* A◦B *is a good for games* W*-automaton for the language* L*.*

#### **3 Efficiently solving games**

From now on, graphs, games and automata are assumed to be finite.

We now present more recent material. We put forward the notion of good for n-games automata (good for small games) as a common explanation for the several recent algorithms for solving parity games 'efficiently'. After describing this notion in Section 3.1, we shall give more insight about it in the context of winning conditions that are memoryless for the existential player in Section 3.2

Much more can be said for parity games and good for small games safety automata: this will be the subject of Section 4.

#### **3.1 Good for small games automata**

We introduce the concept of (strongly) good for n-games automata (good for small games). The use of these automata is the same as for good for games automata, except that they are cannot be composed with any game, but only with small ones. In other words, a good for (W, n)-game automaton yields a reduction for solving W-games with at most n positions (Lemma 6). We shall see in Section 3.2 that as soon as the underlying winning condition is memoryless for the existential player, there are several characterisations for the smallest strongly good for n-games automata. It is good to keep in mind the definition of good for games automata (Definition 7) when reading the following one.

**Definition 8.** *Let* <sup>V</sup> *be a language, and* <sup>A</sup> *be a* <sup>W</sup>*-automaton. Then* <sup>A</sup> *is good for* (V, n)*-games if for all* <sup>V</sup>*-games* <sup>G</sup> *with at most* <sup>n</sup> *positions,* <sup>G</sup> *and* G×A *have the same winner (we also write good for small games when there is no need for* V *and* n *to be explicit).*

*It is strongly good for* (V, n)*-games if it is good for* (V, n)*-games and the language accepted by* <sup>A</sup> *is contained in* <sup>V</sup>*.*

*Example 1 (automata that are good for small games).* We have naturally the following chain of implications:

good for games <sup>=</sup><sup>⇒</sup> strongly good for <sup>n</sup>-games <sup>=</sup><sup>⇒</sup> good for <sup>n</sup>-games

The first implication is from Remark 3, and the second is by definition. Thus the first examples of automata that are strongly good for small games are the automata that are good for games.

*Example 2.* We consider the case of the coB¨uchi condition: recall that the set of colors is {0, <sup>1</sup>} and the winning plays are the ones such that there ultimately contain only 0's. It can be shown that if the existential player wins in a coB¨uchi game with has at most n positions, then she also wins for the winning condition <sup>L</sup> = (0∗(<sup>ε</sup> + 1))n0ω, i.e., the words in which there is at most n occurrences of 1 (indeed, a winning memoryless strategy for the condition coBuchi cannot contain a 1 in a cycle, and hence cannot contain more than n occurrences of 1 in the same play; thus the same strategy is also winning in the same game with the new winning condition L). As a consequence, <sup>a</sup> deterministic safety automaton that accepts the language <sup>L</sup> <sup>⊆</sup> coBuchi (the minimal one has n + 1 states) is good for (coBuchi, n)-games.

Mimicking Lemma 4 which states the closure under composition of good for games automata, we obtain the following variant for good for small games automata:

**Lemma 5 (composition of good for small games automata).** *Let* <sup>B</sup> *be a good for* <sup>n</sup>*-games* <sup>V</sup>*-automaton for the language* <sup>L</sup> *with* <sup>k</sup> *states, and* <sup>A</sup> *be a good for* kn*-games* <sup>W</sup>*-automaton for the language* <sup>V</sup>*, then the composed automaton* A◦B *is a good for* <sup>n</sup>*-games* <sup>W</sup>*-automaton for the language* L*.*

We also directly get an algorithm from such reductions.

**Lemma 6.** *Assume that there exists an algorithm for solving* W*-games of size* <sup>m</sup> *in time* <sup>f</sup>(m)*. Let* <sup>G</sup> *be a* <sup>V</sup>*-game with at most* <sup>n</sup> *positions and* <sup>A</sup> *be a good for* (V, n)*-games* W*-automaton of size* k*, there exists an algorithm for solving* <sup>G</sup> *of complexity* <sup>f</sup>(kn)*.*

*Proof.* Construct the game G×A, and solve it.

The third quasipolynomial time algorithm for solving parity games due to Lehtinen [Leh18] can be phrased using good for small games automata (note that it is not originally described in this form).

**Theorem 8 ([Leh18,BL19]).** *Given positive integers* n, d*, there exists a parity automaton with* <sup>n</sup>(log <sup>d</sup>+O(1)) *states and* 1 + log <sup>n</sup> *priorities which is strongly good for* n*-games.*

Theorem 8 combined with Lemma 6 yields a quasipolynomial time algorithm for solving parity games. Indeed, consider a parity game <sup>G</sup> with <sup>n</sup> positions and <sup>d</sup> priorities. Let <sup>A</sup> be the good for <sup>n</sup>-games automaton constructed by Theorem 8. The game G×A is a parity game equivalent to <sup>G</sup>, which has <sup>m</sup> <sup>=</sup> <sup>n</sup>(log <sup>d</sup>+O(1)) states and <sup>d</sup> = 1+ log <sup>n</sup> priorities. Solving this parity game with a simple algorithm (of complexity <sup>O</sup>(md )) yields an algorithm of quasipolynomial complexity:

$$O(m^{d'}) = O(n^{(\log d + O(1))d'}) = n^{O(\log(d)\log(n))}.$$

#### **3.2 The case of memoryless winning conditions**

In this section we fix a winning condition W which is memoryless for the existential player, and we establish several results characterising the smallest strongly good for small games automata in this case.

Our prime application is the case of parity conditions, that will be studied specifically in Section 4, but this part also applies to conditions such as mean-payoff or Rabin.

The goal is to establish the following theorem (the necessary definitions are introduced during the proof).

**Theorem 9.** *Let* W *be a winning condition which is memoryless for the existential player, then the following quantities coincide for all positive integers* n*:*


The idea of separating automata<sup>4</sup> was introduced by Boja´nczyk and Czerwi´nski [BC18] to reformulate the first quasipolynomial time algorithm [CJK+17]. Czerwi´nski, Daviaud, Fijalkow, Jurdzi´nski, Lazi´c, and Parys [CDF+19] showed that the other two quasipolynomial time algorithms [JL17,Leh18] also can be understood as the construction of separating automata.

The proof of Theorem 9 spans over Sections 3.2 and 3.3. It it a consequence of Lemmas 7, 8, 11, and 12. We begin our proof of Theorem 9 by describing the notion of strongly separating automata.

**Definition 9.** *An automaton* <sup>A</sup> *is strongly* (W, n)*-separating if*

$$\mathbb{W}|\_{n} \subseteq \mathcal{L}(\mathcal{A}) \subseteq \mathbb{W}\text{ ,}$$

*in which* <sup>W</sup>|<sup>n</sup> *is the union of all the languages accepted by safety automata with* <sup>n</sup> *states that accept sublanguages of* W*.* 5

**Lemma 7.** *In the statement of Theorem 9, (1)* =⇒ *(2)* =⇒ *(3).*

*Proof.* Assume (1), i.e., there exists a strongly (W, n)-separating deterministic safety automaton <sup>A</sup>, then <sup>L</sup>(A) <sup>⊆</sup> <sup>W</sup>. Let <sup>G</sup> be a <sup>W</sup>-game with at most <sup>n</sup> positions. By Lemma 3, if the existential player wins G×A, she wins the

<sup>5</sup> Note that there is a natural, more symetric, notion of (W, n)-separating automata in which the requested inclusions are <sup>W</sup>|*<sup>n</sup>* <sup>⊆</sup> <sup>L</sup>(A) <sup>⊆</sup> - W- *n* - . However, nothing is known about this notion.

<sup>4</sup> The definition used in [BC18] is not strictly equivalent to the one we use here: a separating automaton in [BC18] is a strongly separating automaton in our sense, but not conversely.

game G. Conversely, assume that the existential player wins G, then, by assumption she has a winning memoryless strategy graph <sup>S</sup>E, <sup>γ</sup> : <sup>S</sup><sup>E</sup> <sup>→</sup> <sup>G</sup>, i.e., <sup>L</sup>(SE) <sup>⊆</sup> W and γ is injective. By injectivity of γ, S<sup>E</sup> has at most n vertices and hence <sup>L</sup>(SE) <sup>⊆</sup> <sup>W</sup>|<sup>n</sup> <sup>⊆</sup> <sup>L</sup>(A). As a consequence, for every (partial) play <sup>π</sup> compatible with <sup>S</sup>E, there exists a (partial) run of <sup>A</sup> over the labels of <sup>π</sup> (call this property ). We construct a new strategy for the existential player in G×A as follows: When the token is in a game position, the existential player plays as in SE; When the token is in an automaton position, the existential player plays the only available move (indeed, the move exists by property , and is unique by the determinism assumption). Since this is a safety game, the new strategy is winning. Hence the existential player wins G×A, proving that A is good for (W, n)-games. Item 2 is established.

Assume now (2), i.e., that <sup>A</sup> is some strongly good for (W, n)-games automaton. Then by definition <sup>L</sup>(A) <sup>⊆</sup> <sup>W</sup>. Now consider some word <sup>u</sup> in <sup>W</sup>|<sup>n</sup>. By definition, there exists some safety automaton <sup>B</sup> with at most <sup>n</sup> states such that <sup>u</sup> <sup>∈</sup> <sup>L</sup>(B) <sup>⊆</sup> <sup>W</sup>. This automaton can be seen as a <sup>W</sup>-game <sup>G</sup> in which all positions are owned by the universal player. Since <sup>L</sup>(B) <sup>⊆</sup> <sup>W</sup>, the existential player wins the game <sup>G</sup>. Since furthermore <sup>A</sup> is good for (W, n)-games, the existential player has <sup>a</sup> winning strategy <sup>S</sup><sup>E</sup> in G×A. Assume now that the universal player is playing the letters of <sup>u</sup> in the game G×A, then the winning strategy <sup>S</sup><sup>E</sup> constructs an accepting run of <sup>A</sup> on <sup>u</sup>. Thus <sup>u</sup> <sup>∈</sup> <sup>L</sup>(A), and Item <sup>3</sup> is established.

We continue our proof of Theorem 9 by introducing the notion of (W, n)-universal graph.

**Definition 10.** *Given a winning condition* <sup>W</sup> <sup>⊆</sup> <sup>C</sup><sup>ω</sup> *and a positive integer* <sup>n</sup>*, a* C*-graph* U *is* (W, n)*-universal*<sup>6</sup> *if*


We are now ready to prove one more implication of Theorem 9.

**Lemma 8.** *In the statement of Theorem 9, (4)* =⇒ *(1)*

*Proof.* Assume that there is a (W, n)-universal graph U. We show that U seen as an safety automaton is strongly good for (W, n)-games. One part is straightforward: <sup>L</sup>(U) <sup>⊆</sup> <sup>W</sup> is by assumption. For the other part, consider a <sup>W</sup>-game <sup>G</sup> with at most <sup>n</sup> positions. Assume that the existential player wins <sup>G</sup>, this means that there exists a winning memoryless strategy for the existential player <sup>S</sup>E, <sup>γ</sup> : <sup>S</sup><sup>E</sup> <sup>→</sup> <sup>G</sup> in <sup>G</sup>. We then construct a strategy for the existential player <sup>S</sup> <sup>E</sup> that maintains the property that the only game positions in G × <sup>U</sup> that are met in <sup>S</sup> <sup>E</sup> are of the form (x, γ(x)). This is done as follows: when a game position is encountered, the existential player plays like the strategy SE, and when an automaton position is encountered, the existential player plays in order to follow γ. This is possible since γ is a weak graph morphism.

<sup>6</sup> Note that this is not the notion of (even weak) universality in categorical terms since U is not in general itself of size n.

#### **3.3 Maximal graphs**

In order to continue our proof of Theorem 9, more insight is needed: we have to understand what are the W-maximal graphs. This is what we do now.

**Definition 11.** *<sup>A</sup>* <sup>C</sup>*-graph* <sup>H</sup> *is* <sup>W</sup>*-maximal if* <sup>L</sup>(H) <sup>⊆</sup> <sup>W</sup> *and if it is not possible to add a single edge to it without breaking this property, i.e., without producing an infinite path from the root vertex that does not belong to* W*.*

**Lemma 9.** *For a winning condition* <sup>W</sup> <sup>⊆</sup> <sup>C</sup> *which is memoryless for the existential player, and* H *a* W*-maximal graph, then the* ε*-edges in* H *form a transitive and total relation.*

*Proof.* Transitivity arises from the epsilon property of winning conditions (Definition 1): Consider three vertices x, y and z such that α = (x, ε, y) and β = (y, ε, z) are edges of H. Let us add a new edge δ = (x, ε, y) yielding a new graph H . Let us consider now any infinite path π in H starting in the root (this path may contain finitely of infinitely many occurrences of δ, but not almost only <sup>δ</sup>'s since <sup>x</sup> <sup>=</sup> <sup>y</sup>). Let <sup>π</sup> be obtained from <sup>π</sup> by replacing each occurrence of δ by the sequence αβ. The resulting path π belongs H, and thus its labelling belongs to W. But since the labelings of π and π agree after removing all the occurrences of ε, the epsilon property guarantees that the labelling of π belongs to <sup>W</sup>. Since this holds for all choices of <sup>π</sup>, we obtain <sup>L</sup>(H ) <sup>⊆</sup> <sup>W</sup>. Hence, by maximality, <sup>δ</sup> <sup>∈</sup> <sup>H</sup>, which means that the <sup>ε</sup>-edges form a transitive relation.

Let us prove the totality. Let x and y be distinct vertices of H. We have to show that either x ε−→ <sup>y</sup> or <sup>y</sup> ε−→ <sup>x</sup>. We can turn <sup>H</sup> into a game <sup>G</sup> as follows:


We claim first that the game G is won by the existential player. Let us construct <sup>a</sup> strategy <sup>s</sup><sup>E</sup> in <sup>G</sup> as follows. The only moment the existential player has a choice to make is when the play reaches the position z. This has to happen after a move of the form (t, a, z). This move originates either from an edge of the form (t, a, x), or from an edge of the form (t, a, y). In the first case the strategy s<sup>E</sup> chooses the move (z, ε, x), and in the second case the move (z, ε, y). Let us consider a play π compatible with sE, and let π be obtained from π by replacing each occurrence of (t, a, z)(z, ε, x) with (t, a, x) and each occurrence of (t, a, z)(z, ε, y) with (t, a, y). The resulting π is a path in H and hence its labeling belongs to W. Since the labelings of π and π are equivalent up to ε-letters, by the epsilon property, the labeling of π also belongs to W. Hence the strategy s<sup>E</sup> witnesses the victory of the existential player in G. The claim is proved.

By assumption on W, this means that there exists a winning memoryless strategy for the existential player <sup>S</sup><sup>E</sup> in <sup>G</sup>. In this strategy, either the existential player always chooses (z, ε, x), or she always chooses (z, ε, y). Up to symmetry, we can assume the first case. Let now H be the graph H to which a new edge <sup>δ</sup> = (y, ε, <sup>x</sup>) has been added. We aim that <sup>L</sup>(H ) <sup>⊆</sup> <sup>W</sup>. Let <sup>π</sup> be an infinite path in H starting from the root vertex. In this path, each occurrences of δ are preceded by an edge of the form (t, a, y). Thus, let π be obtained from π by replacing each occurrence of a sequence of the form (t, a, y)δ by (t, a, y). The resulting path is a play compatible with SE. Hence the labeling of π belongs to W, and as a consequence, by the epsilon property, this is also the case for π. Since this holds for all choices of <sup>π</sup>, we obtain that <sup>L</sup>(H ) <sup>⊆</sup> <sup>W</sup>. Hence, by W-maximality assumption, (y, ε, x) is an edge of H.

Overall, the ε-edges form a total transitive relation.

Let ε be the least relation closed under reflexivity and that extends the ε-edge relation.

**Lemma 10.** *For a winning condition* W *which is memoryless for the existential player, and* H *a* W*-maximal graph, then the following properties hold:*


*Proof.* The first part is obvious from Lemma 9. For the second part, it is sufficient to prove that <sup>x</sup> <sup>a</sup>−→ <sup>y</sup> ε−→ <sup>z</sup> implies <sup>x</sup> <sup>a</sup>−→ <sup>y</sup> and that <sup>x</sup> ε−→ <sup>y</sup> <sup>a</sup>−→ <sup>z</sup> implies <sup>x</sup> <sup>a</sup>−→ <sup>y</sup>. Both cases are are similar to the proof of transitivity in Lemma <sup>9</sup><sup>7</sup>.

The two next items are almost the same. The difficult direction is to assume the language inclusion, and deduce the existence of an edge (left to right). Let us assume for an instant that H would be a finite word automaton, with all its states accepting. Then it is an obvious induction to show that if <sup>a</sup>L(Hq) <sup>⊆</sup> <sup>L</sup>(Hp) (as languages of finite words), it is safe to add an ε-transitions from q to p without changing the language. The two above items are then obtained by limit passing (this is possible because the safety condition is topologically closed).

We are now ready to provide the missing proofs for Theorem 9: from (3) to (4), and from (3) to (1). Both implications arise from Lemma 9.

**Lemma 11.** *In the statement of Theorem 9, (3)* =⇒ *(4).*

<sup>7</sup> This arises in fact from a more general simple phenomenon: if the sequence ab is 'indistinguishable in any context' from c (meaning that if one substitutes simultaneously infinitely many occurrences of ab with occurrences of c one does not change the membership to <sup>W</sup>), then <sup>x</sup> *<sup>a</sup>*−→ <sup>y</sup> *<sup>b</sup>* −→ <sup>z</sup> implies <sup>x</sup> *<sup>c</sup>* −→ <sup>z</sup>.

*Proof.* Let us start from a strongly (W, <sup>n</sup>)-separating safety automaton <sup>A</sup>. Without loss of generality, we can assume it is W-maximal. We claim that it is (W, n)-universal.

Let us define first for all languages <sup>K</sup> <sup>⊆</sup> <sup>C</sup>ω, its closure

$$\overline{K} = \bigcap\_{\mathcal{L}(\mathcal{A}\_s) \supseteq K} \mathcal{L}(\mathcal{A}\_s)$$

(in case of an empty intersection, we assume C<sup>ω</sup>). This is a closure operator: <sup>K</sup> <sup>⊆</sup> <sup>K</sup> implies <sup>K</sup> <sup>⊆</sup> <sup>K</sup> , <sup>K</sup> <sup>⊆</sup> <sup>K</sup>, and <sup>K</sup> <sup>=</sup> <sup>K</sup>. Futhermore, aK <sup>⊆</sup> aK, for all letters <sup>a</sup> <sup>∈</sup> <sup>C</sup>. Let now <sup>H</sup> be a trimmed graph with at most <sup>n</sup> vertices such that <sup>L</sup>(H) <sup>⊆</sup> <sup>W</sup>. We have <sup>L</sup>(H) <sup>⊆</sup> <sup>W</sup>|<sup>n</sup> by definition of <sup>W</sup>|<sup>n</sup>.

We claim that for each vertex <sup>x</sup> of <sup>H</sup>, there is a state <sup>α</sup>(x) of <sup>A</sup> such that

$$
\mathcal{L}(\mathcal{A}\_{\alpha(x)}) = \overline{\mathcal{L}(H\_x)}\ .
$$

Indeed, note first that, since H is trimmed, there exists some word u such that rootH u −→ <sup>x</sup>. Hence, using the fact that <sup>A</sup> is strongly (W, n)-separating, we get that for all <sup>v</sup> <sup>∈</sup> <sup>L</sup>(Hx), uv <sup>∈</sup> <sup>L</sup>(H) <sup>⊆</sup> <sup>W</sup>|<sup>n</sup> <sup>⊆</sup> <sup>L</sup>(A). Let <sup>β</sup>(v) be the state assumed after reading <sup>u</sup> by a run of <sup>A</sup> accepting uv. It is such that <sup>v</sup> <sup>∈</sup> <sup>L</sup>(Aβ(v)). Since A is finite and its states are totally ordered under inclusion of residuals (Lemma 10), this means that there exists a state α(x) (namely the maximum over all the <sup>β</sup>(w) for <sup>w</sup> <sup>∈</sup> <sup>L</sup>(Hx)) such that <sup>L</sup>(Aα(x)) = <sup>L</sup>(Hx).

Let us show that <sup>α</sup> is a weak graph morphism<sup>8</sup> from <sup>H</sup> to <sup>A</sup>. Consider some edge (x, a, <sup>y</sup>) of <sup>H</sup>. We have <sup>a</sup>L(Hy) <sup>⊆</sup> <sup>L</sup>(Hx). Hence

$$a\mathcal{L}(\mathcal{A}\_{\alpha(y)}) = a\overline{\mathcal{L}(H\_{\mathcal{Y}})} \subseteq \overline{a\mathcal{L}(H\_{\mathcal{Y}})} \subseteq \overline{\mathcal{L}(H\_{\mathcal{X}})} = \mathcal{L}(\mathcal{A}\_{\alpha(\mathcal{X})})\,,$$

which implies by Lemma <sup>10</sup> that <sup>α</sup>(x) <sup>a</sup> −→<sup>A</sup> <sup>α</sup>(y). Let now root<sup>H</sup> a−→<sup>H</sup> <sup>x</sup> be some edge. By hypothesis, we have

$$a\mathcal{L}(H\_x) \subseteq \mathcal{L}(H) \subseteq \mathbb{W}|\_n \subseteq \mathcal{L}(\mathcal{A})\,.$$

Thus <sup>L</sup>(A<sup>α</sup>(x)) = <sup>a</sup>L(Hx) <sup>⊆</sup> <sup>L</sup>(A) = <sup>L</sup>(Aroot*<sup>A</sup>* ). We obtain root<sup>A</sup> a −→<sup>A</sup> <sup>α</sup>(x) by Lemma 10. Hence, α is a weak graph morphism.

Since this holds for all choices of <sup>H</sup>, we have proved that <sup>A</sup> is a (W, n)-universal graph.

**Lemma 12.** *In the statement of Theorem 9, (3)* =⇒ *(1).*

*Proof.* Let us start from a strongly (W, <sup>n</sup>)-separating safety automaton <sup>A</sup>. Without loss of generality, we can assume it is maximal. Thus Lemma 10 holds.

<sup>8</sup> Note that in general that α is not a (non-weak) graph morphism, even for conditions like parity. Even more, such a graph morphism does not exist in general.

We now construct a deterministic safety automaton D.


We have to show that this deterministic safety automaton is strongly (W, n) separating. Note first that by definition D is obtained from A by removing transitions. Hence <sup>L</sup>(D) <sup>⊆</sup> <sup>L</sup>(A) <sup>⊆</sup> <sup>W</sup>. Consider now some <sup>u</sup> <sup>∈</sup> <sup>W</sup>|<sup>n</sup>. By assumption, <sup>u</sup> <sup>∈</sup> <sup>L</sup>(A). Let <sup>ρ</sup> = (p0, <sup>u</sup>1, <sup>p</sup>1)(p1, <sup>u</sup>2, <sup>p</sup>2)··· be the corresponding accepting run of <sup>A</sup>. We construct by induction a (the) run of <sup>D</sup> (q0, <sup>u</sup>1, <sup>q</sup>1)(q1, <sup>u</sup>2, <sup>q</sup>2)··· in such a way that q<sup>i</sup> ε pi. For the initial state, p<sup>0</sup> = q0. Assume the run up to q<sup>i</sup> ε p<sup>i</sup> has been constructed. By Lemma 10, (qi, u<sup>i</sup>+1, p<sup>i</sup>+1) is a transition of <sup>A</sup>. Hence the least <sup>r</sup> such that (qi, <sup>u</sup><sup>i</sup>+1, <sup>r</sup>) is a transition of <sup>A</sup> does exist, and is ε p<sup>i</sup>+1. Let us call it q<sup>i</sup>+1; we indeed have that (qi, u<sup>i</sup>+1, q<sup>i</sup>+1) is a transition of <sup>D</sup>. Hence, <sup>u</sup> is accepted by <sup>D</sup>. Thus <sup>W</sup>|<sup>n</sup> <sup>⊆</sup> <sup>L</sup>(D).

Overall <sup>D</sup> is a strongly (W, n)-separating deterministic safety automaton that has at most as many states as A.

#### **4 The case of parity conditions**

We have seen above some general results on the notion of universal graphs, separating automata, and automata that are good for small games. In particular, we have seen Theorem 9 showing the equivalence of these objects for memoryless for the existential player winning conditions.

We are paying now a closer attention to the particular case of the parity condition. The technical developments that follow give an alternative proof of the equivalence results proved in [CDF+19] between strongly separating automata and universal trees.

#### **4.1 Parity and cycles**

We begin with a first classical lemma, which reduces the questions of satisfying a parity condition to checking the parity of cycles.

In a directed graph labelled by priorities, an even cycle is a cycle (all cycles are directed) such that the maximal priority occurring in it is even. Otherwise, it is an odd cycle. As usual, an elementary cycle is a cycle that does not meet twice the same vertex.

**Lemma 13.** *For a* [i, j]*-graph* H *that has all its vertices reachable from the root, the following properties are equivalent:*


*Proof.* Clearly, since all vertices are reachable, <sup>L</sup>(H) <sup>⊆</sup> <sup>W</sup> implies that all the cycles are even. Also, if all cycles are even, then all elementary cycles also are. Finally assume that all the elementary cycles are even. Then we can consider H as a game, in which every positions is owned by the universal player. Assume that some infinite path from the root would not satisfy Parity[i,j], then this path would be a winning strategy for the universal player in this game. Since Parity[i,j] is a winning condition memoryless for the universal player, this means that the universal player has a winning memoryless strategy. But this winning memoryless strategy is nothing but a lasso, and thus contains an elementary cycle of maximal odd priority.

#### **4.2 The shape and size of universal graphs for parity games**

We continue with a fixed d, and we consider parity conditions using priorities in [0, 2d]. More precisely, we relate the size of universal graphs for the parity condition with priorities [0, 2d] to universal d-trees as defined now:

**Definition 12.** *A* d*-tree* t *is a balanced, unranked, ordered tree of height* d *(the root does not count: all branches contain exactly* d+ 1 *nodes). The order between nodes of same level is denoted* t*. Given a leaf* x*, and* i = 0 ...i*, we denote* anc<sup>t</sup> i (t) *the ancestor at depth* i *of* x *(*0 *is the root,* d *is* x*).*

*The* d*-tree* t *is* n*-universal if for all* d*-trees* s *with at most* n *nodes, there is a* d*-tree embedding of* s *into* t*, in which a* d*-tree embedding is an injective mapping from nodes of* s *to nodes of* t *that preserves the height of nodes, the ancestor relation, and the order of nodes. Said differently,* s *is obtained from* t *by pruning some subtrees (while keeping the structure of a* d*-tree).*

**Definition 13.** *Given a* d*-tree* t*,* Graph(t) *is a* [0, 2d]*-graph with the following characteristics:*

*– the vertices are the leaves of* t*, – for* 0 i d*,* x 2(d−i) −→ Graph(t) <sup>y</sup> *if* anc<sup>t</sup> i (x) <sup>t</sup> anc<sup>t</sup> i (y)*, – for* 0 < i d*,* x 2(d−i)+1 −→ Graph(t) <sup>y</sup> *if* anc<sup>t</sup> i (x) < anc<sup>t</sup> i (y)*.*

**Lemma 14.** *For all* <sup>d</sup>*-trees* <sup>t</sup>*,* <sup>L</sup>(Graph(t)) <sup>⊆</sup> Parity[0,2d]*.*

*Proof.* Using Lemma 13, it is sufficient to prove that all cycle in Graph(t) are even. Thus, let us consider a cycle ρ. Assume that the highest priority occurring in <sup>α</sup> is 2(<sup>d</sup> <sup>−</sup> <sup>i</sup>) + 1. Note then that for all edges <sup>α</sup> = (x, k, <sup>y</sup>) occurring in <sup>ρ</sup>:

– anc<sup>t</sup> i (x) <sup>t</sup> anc<sup>t</sup> i (y) since k i + 1, – if <sup>k</sup> = 2(<sup>d</sup> <sup>−</sup> <sup>i</sup>) + 1, anc<sup>t</sup> i (x) < anc<sup>t</sup> i (y).

As a consequence, the first and last vertex of α cannot have the same ancestor at level i, and thus are different.

Below, we develop sufficient results for establishing:

**Theorem 10 ([CF18]).** *For all positive integers* d, n*, the two following quantities are equal:*


*Proof.* We shall see below (Definition 14) a construction Tree that maps all Parity[0,2d]-maximal graphs G to a d-tree Tree(G) of smaller or same size. Corollary 4 establishes that this construction is in some sense the converse of Tree (in fact they form an adjunction). and that this correspondence preserves the notions of universality. This proves the above result: Given a n-universal d-tree t, then, by Corollary 4, Graph(t) is a (Parity[0,2d], n)-universal graph that has as many vertices as leaves of graphs. Conversely, consider a (Parity[0,2d], n)-universal graph G. One can add to it edges until it becomes a Parity[0,2d]-maximal graph G with as many vertices. Then, by Corollary 4, Tree(G ) is an n-universal d-tree that has as much or less leaves than vertices of G .

*Example 3.* The complete d-tree t of degree n (that has n<sup>d</sup> leaves) is n-universal. The [0, 2d]-graph Graph(t) obtained in this way is used in the small progress measure algorithm [Jur00].

However, there exists n-universal d-trees that are much smaller than in the above example. The next theorem provides an upper and a lower bound.

**Theorem 11 ([Fij18,CDF+19]).** *Given positive integers* n, d*,*

*– there exists an* n*-universal* d*-tree with*

$$n \cdot \binom{\lceil \log(n) \rceil + d - 1}{\lceil \log(n) \rceil}$$

*leaves.*

*– all* n*-universal* d*-trees have at least*

$$
\binom{\lfloor \log(n) \rfloor + d - 1}{\lfloor \log(n) \rfloor}
$$

*leaves.*

**Corollary 2.** *The complexity of solving* Parity[0,d]*-games with at most* n*vertices is*

$$O\left(mn\log(n)\log(d)\cdot \binom{\lceil\log(n)\rceil+d/2-1}{\lceil\log(n)\rceil}\right)\cdot 1$$

*and no algorithm based on good for small safety games can be faster than quasipolynomial time.*

**Maximal universal graphs for the parity condition** We shall now analyse in detail the shape of Parity[0,2d]-maximal graphs. This analysis culminates with the precise description of such graphs in Lemma 19, that essentially establishes a bijection with graphs of the form Graph(t) (Corollary 4).

Let us note that, since the parity condition is memoryless for the existential player, using Lemma 10, and the fact that the parity condition is unchanged by modifying finite prefixes, we can always assume that the root vertex is the minimal one for the ε ordering. Thus, from now, we do not have to pay attention to the root, in particular in weak graph morphisms. Thus, from now, we just mention the term morphism for weak graph morphisms.

Let us recall preference ordering between the non-negative integers is defined as follows:

··· <sup>2</sup><sup>d</sup> + 1 <sup>2</sup><sup>d</sup> <sup>−</sup> <sup>1</sup> ··· <sup>3</sup> <sup>1</sup> <sup>0</sup> <sup>2</sup> ··· <sup>2</sup><sup>d</sup> <sup>−</sup> <sup>2</sup> <sup>2</sup><sup>d</sup> ···

**Fact 1.** *Let* <sup>k</sup> *and* u, <sup>v</sup> *sequences of priorities. If the maximal priority occurring in* ukv *is even, then the maximal priority occurring in* uv *is also even.*

**Lemma 15.** *Let* <sup>G</sup> *be a* Parity[0,2d]*-maximal graph and* <sup>k</sup> *be priorities in* [0, 2d]*. For all vertices* x, y *of* G*,* x k −→<sup>G</sup> <sup>y</sup> *implies* <sup>x</sup> −→<sup>G</sup> <sup>y</sup>*.*

*Proof.* Let us add (x, , y) to G. Let u(x, , y)v be some elementary cycle of the new graph involving the new edge (x, , y). By Lemma 13, u(x, k, y)v is an even cycle in the original graph. Hence, by Fact 1, u(x, , y)v is also an even cycle. Thus, by Lemma 13, G with the newly added edge also satisfies <sup>L</sup>(G) <sup>⊆</sup> Parity[0,2d]. Using the maximality assumption for <sup>G</sup>, we obtain that (x, , y) was already present in G.

**Lemma 16.** *Let* G *be a* Parity[0,2d]*-maximal graph. For all vertices* x, y, z *of* G*, if* x k −→<sup>G</sup> <sup>y</sup> *and* <sup>y</sup> −→<sup>G</sup> <sup>z</sup>*, then* <sup>y</sup> max(k,) −→ <sup>G</sup> <sup>z</sup>*.*

*Proof.* Let us add (x, max(k, ), z) to G. Let u(x, max(k, ), z)v be an elementary cycle in the new graph. By Lemma 13, u(x, k, y)(y, , z)v, being a cycle of G, has to be even. Since, furthermore, the maximal priority that occurs in u(x, k, y)(y, , z)v is the same as the maximal one in u(x, max(k, ), z)v, the cycle u(x, max(k, ), z)v is also even. Using the maximality assumption of G, we obtain that (x, max(k, ), z) was already present in G.

**Lemma 17.** *Let* G *be a* Parity[0,2d]*-maximal graph, and* x, y *be vertices, then* x <sup>0</sup> −→<sup>G</sup> <sup>x</sup>*, and* <sup>x</sup> <sup>2</sup><sup>d</sup> −→<sup>G</sup> <sup>y</sup>*.*

*Proof.* For x <sup>0</sup> −→<sup>G</sup> <sup>x</sup>, it is sufficient to notice that adding the edge (x, <sup>0</sup>, <sup>x</sup>), if it was not present, simply creates one new elementary cycle to G, namely (x, 0, x). Since it is an even cycle, by Lemma 13, the new graph also satisfies <sup>L</sup>(G) <sup>⊆</sup> Parity[0,2d]. Hence, by maximality assumption, the edge was already present in G before.

Consider the graph G with an extra edge (x, 2d, y) added. Consider now an elementary cycle that contains (x, 2d, y), i.e., of the form u(x, 2d, y)v. Its maximal priority is 2d, and thus even. Hence by Lemma 13 and maximality assumption, the edge was already present in G.

**Lemma 18.** *Let* <sup>G</sup> *be a* Parity[0,2d]*-maximal graph and* <sup>k</sup> = 0, <sup>2</sup>,..., <sup>2</sup><sup>d</sup> <sup>−</sup> <sup>2</sup>*. For all vertices* x, y*,* x k+1 −→<sup>G</sup> <sup>y</sup> *holds if and only if* <sup>y</sup> k −→<sup>G</sup> <sup>x</sup> *does not hold.*

*Proof.* Assume first that y k+1 −→<sup>G</sup> <sup>x</sup> and <sup>x</sup> k −→<sup>G</sup> <sup>y</sup> both holds. Then <sup>y</sup> k+1 −→<sup>G</sup> x k −→<sup>G</sup> <sup>y</sup> is an odd cycle contradicting Lemma 13.

Conversely, assume that adding the edge x k+1 −→<sup>G</sup> <sup>y</sup> would break the property <sup>L</sup>(G) <sup>⊆</sup> Parity[0,2d]. This means that there is an elementary cycle of the form u(x, k + 1, y)v which is odd. Let  be the maximal priority in vu. If  k + 1, then  is odd, and thus  <sup>k</sup>, and we obtain <sup>y</sup> k −→<sup>G</sup> <sup>x</sup> by Lemma 15. Otherwise, <sup>k</sup>, and again  <sup>k</sup>. Once more <sup>y</sup> k −→<sup>G</sup> <sup>x</sup> holds by Lemma 15.

**Lemma 19.** *A* [0, 2d]*-graph* G *is a* Parity[0,2d]*-maximal graph if and only if all the following properties hold:*


$$\text{4. } \stackrel{k+1}{\longrightarrow}\_G = (\stackrel{k}{\leftarrow}\_G)^{\mathbb{C}} \text{ for all } k = 0, 2, \ldots, 2d - 2. ^{\text{9}}$$

*Proof.* First direction. Assume first that G is a Parity[0,2d]-maximal graph.

(1) Let k = 0, 2,..., 2d; k −→<sup>G</sup> is transitive by Lemma 16. Furthermore, by Lemma 17, x <sup>0</sup> −→<sup>G</sup> <sup>x</sup> for all vertices <sup>x</sup>, and thus by Lemma 15, since 0 <sup>k</sup>, x k −→<sup>G</sup> <sup>x</sup>. Hence <sup>k</sup> −→<sup>G</sup> is also reflexive and hence a preorder. Consider now another vertex y. By Lemma 18, either x k −→<sup>G</sup> <sup>y</sup> or <sup>y</sup> <sup>k</sup>+1 −→<sup>G</sup> <sup>x</sup>. But by Lemma 15, y <sup>k</sup>+1 −→<sup>G</sup> <sup>x</sup> implies <sup>y</sup> k −→<sup>G</sup> <sup>x</sup>. Hence either <sup>x</sup> k −→<sup>G</sup> <sup>y</sup> or <sup>y</sup> k −→<sup>G</sup> <sup>k</sup>. Thus <sup>k</sup> −→<sup>G</sup> is a total preorder.


Second direction. Assume now that G satisfies the conditions (1)-(4). Let us first show that <sup>L</sup>(G) <sup>⊆</sup> Parity[0,2d]. For the sake of contradiction, consider an elementary cycle that would be odd. It can be written as u(x, k, y)v with a

<sup>9</sup> Note that this also means, since *<sup>k</sup>*−→*<sup>G</sup>* is a total preorder, that *<sup>k</sup>*+1 −→*<sup>G</sup>*<sup>=</sup> *<sup>k</sup>* −→*<sup>G</sup>* \ *<sup>k</sup>*←−*<sup>G</sup>*. maximal odd priority <sup>k</sup>. Note first that −→⊆k−<sup>1</sup> −→ for all  <sup>k</sup>: indeed, by (2), this is true if  is even, and by (1) and (4), <sup>j</sup> −→⊆j−<sup>1</sup> −→ for all <sup>j</sup> odd. Also <sup>k</sup> −→<sup>G</sup> is the strict version of the preorder <sup>k</sup>−<sup>1</sup> −→G. Hence, the path <sup>u</sup>(x, k, <sup>y</sup>)<sup>v</sup> has to strictly advance with respect to the preorder <sup>k</sup>−<sup>1</sup> −→G: it cannot be a cycle.

Assume now that an edge (x, k, y) is not present in G. If k is even, since (x, k, y) is not present, by (4) this means that (y, k+1, x) is present. Hence, adding the edge (x, k, y) would create the odd cycle (x, k, y)(y, k + 1, x). If k is odd, since (x, k, <sup>y</sup>) is not present, by (4) this means that (y, <sup>k</sup> <sup>−</sup> <sup>1</sup>, <sup>x</sup>) is present. Hence, adding the edge (x, k, <sup>y</sup>) would create the odd cycle (x, k, <sup>y</sup>)(y, <sup>k</sup> <sup>−</sup> <sup>1</sup>, <sup>x</sup>). Hence G is Parity[0,2d]-maximal.

**Corollary 3.** *Given a morphism* α *from a* Parity[0,2d]*-maximal graph* H *to a* Parity[0,2d]*-maximal graph* G*, then* x k −→<sup>H</sup> <sup>y</sup> *if and only if* <sup>α</sup>(x) <sup>k</sup> −→<sup>G</sup> <sup>α</sup>(y)*, for all vertices* x, y *of* H *and integers* k *in* [0, 2d]*. Furthermore, if* α *is surjective, then every map* <sup>β</sup> *from* <sup>G</sup> *to* <sup>H</sup>*, such that* <sup>α</sup>◦<sup>β</sup> *is the identity on* <sup>G</sup> *is an injective morphism.*

*Proof.* First part. From left to right, this is the definition of a morphism. The other direction is by (4) of Lemma 19: if <sup>α</sup>(x) <sup>k</sup> −→<sup>G</sup> <sup>α</sup>(y) and <sup>k</sup> is odd, then <sup>α</sup>(x) <sup>k</sup>−<sup>1</sup> −→<sup>G</sup> <sup>α</sup>(y) does not hold by (4), thus <sup>x</sup> <sup>k</sup>−<sup>1</sup> −→<sup>H</sup> <sup>y</sup> does not hold by morphism, thus x k −→<sup>H</sup> <sup>y</sup> holds by (4) again. The case of <sup>k</sup> even is similar (using <sup>k</sup> + 1 this time).

For the second part, since <sup>α</sup> ◦ <sup>β</sup> is the identity, <sup>β</sup> has to be injective. It is a morphism by the first part.

The next definition, allowing to go from graphs to trees is shown meaningful by Lemma 19:

**Definition 14.** *Let* G *be a* Parity[0,2d]*-maximal graph. The* d*-tree* Tree(G) *is constructed as follows:*


We shall see that Graph and Tree are almost the inverse one of the other. This is already transparent in the following lemma, which is just a reformulation of the definitions.

**Lemma 20.** *Let* q *be the quotient map from vertices of* G *to leaves of* Tree(G) *that maps each vertex to its* ( <sup>0</sup> −→<sup>G</sup> <sup>∩</sup> <sup>0</sup> ←−<sup>G</sup>)*-equivalence class. It has the following property for all vertices* x, y *of* G*:*

<sup>x</sup> 2(*d*−*i*) −→ *<sup>G</sup>* <sup>y</sup> *if and only if* ancTree(*G*) *<sup>i</sup>* (q(x)) Tree(*G*) ancTree(*G*) *<sup>i</sup>* (q(y)) , *and* <sup>x</sup> 2(*d*−*i*)+1 −→ *<sup>G</sup>* <sup>y</sup> *if and only if* ancTree(*G*) *<sup>i</sup>* (q(x)) <sup>&</sup>lt; [Tree(G)]ancTree(*G*) *<sup>i</sup>* (q(y)) .

*The identity maps the vertices of* Graph(t) *to the leaves of* t*, and has the property that for all vertices* x, y*:*

$$x \stackrel{2(d-i)}{\longrightarrow} \mathbf{Graph}(t) \ y \quad \text{if } and \ only \ if \quad \mathbf{anc}\_i^t(x) \leqslant\_t \mathbf{anc}\_i^t(y) \; ,$$

$$and \quad x \stackrel{2(d-i)+1}{\longrightarrow} \mathbf{Graph}(t) \; \ y \quad \text{if } and \ only \ if \quad \mathbf{anc}\_i^t(x) < \mathbf{anc}\_i^t(y) \; .$$

**Corollary 4.** <sup>10</sup>*For all* Parity[0,2d]*-maximal graphs* G, H*, all* d*-trees* t*, and all positive integers* n*,*


*Proof.* Let q be the quotient from Lemma 20. It can be seen as a surjective map from vertices of Graph(Tree(G)) to G. By Lemma 20 it is a morphism. By Corollary 3, Graph(Tree(G)) is also an induced subgraph of G.

The leaves of Tree(Graph(t)) are the singletons consisting of leaves of t. Hence, there is a bijective map from leaves of Tree(Graph(t)) to leaves of t that sends{} to . By Lemma 20, this is a morphism, and by Corollary <sup>3</sup> an isomorphism.

For the third item, assume first that there is a morphism from H to Graph(t). By the first point, there is an injective morphism from Graph(Tree(H)) to H. By composition, we obtain a morphism from Graph(Tree(H)) to Graph(t). By Lemma 20, it is also a tree embedding from Tree(H) to t. Conversely, assume that there exists an embedding from Tree(H) to t. It can be raised by Lemma 20 to a morphism from Graph(Tree(H)) to Graph(t). By the first point, there is a morphism from H to Graph(Tree(H)). By composition, we get a morphism from H to Graph(t).

The two last items are obvious from the one just before.

**Acknowledgements.** We thank Pierre Ohlmann for many interesting discussions, and Marcin Jurdzi´nski for his comments on an earlier draft of this paper.

<sup>10</sup> The careful reader will recognize Tree and Graph as left and right adjoints.

#### **References**

	- [CF18] Colcombet, T., Fijalkow, N.: Parity games and universal graphs. CoRR, abs/1810.05106 (2018)
	- [CFH14] Colcombet, T., Fijalkow, N., Horn, F.: Playing safe. In: FSTTCS, pp. 379– 390 (2014)
	- [EJ91] Emerson, E.A., Jutla, C.S.: Tree automata, mu-calculus and determinacy (extended abstract). In: FOCS, pp. 368–377 (1991)
	- [EM79] Ehrenfeucht, A., Mycielski, J.: Positional strategies for mean payoff games. Int. J. Game Theory **109**(8), 109–113 (1979)
	- [FGO18] Fijalkow, N., Gawrychowski, P., Ohlmann, P.: The complexity of mean payoff games using universal graphs. CoRR, abs/1812.07072 (2018)
		- [Fij18] Fijalkow, N.: An optimal value iteration algorithm for parity games. CoRR, abs/1801.09618 (2018)
	- [GH82] Gurevich, Y., Harrington, L.: Trees, automata, and games. In: STOC, pp. 60–65 (1982)
	- [HP06] Henzinger, T.A., Piterman, N.: Solving games without determinization. In: Esik, Z. (ed.) CSL 2006. LNCS, vol. 4207, pp. 395–410. Springer, Heidelberg ´ (2006). https://doi.org/10.1007/11874683 26
	- [JL17] Jurdzi´nski, M., Lazi´c, R.: Succinct progress measures for solving parity games. In: LICS, pp. 1–9 (2017)
	- [Jur00] Jurdzi´nski, M.: Small progress measures for solving parity games. In: Reichel, H., Tison, S. (eds.) STACS 2000. LNCS, vol. 1770, pp. 290–301. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-46541-3 24
	- [Leh18] Lehtinen, K.: A modal-μ perspective on solving parity games in quasipolynomial time. In: LICS, pp. 639–648 (2018)
	- [Mar75] Martin, D.A.: Borel determinacy. Ann. Math. **102**(2), 363–371 (1975)

#### 26 T. Colcombet and N. Fijalkow

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Resource-Tracking Concurrent Games**

Aurore Alcolei(B), Pierre Clairambault, and Olivier Laurent

Universit´e de Lyon, ENS de Lyon, CNRS, UCB Lyon 1, LIP, Lyon, France {Aurore.Alcolei,Pierre.Clairambault,Olivier.Laurent}@ens-lyon.fr

**Abstract.** We present a framework for game semantics based on concurrent games, that keeps track of *resources* as data modified throughout execution but not affecting its control flow. Our leading example is *time*, yet the construction is in fact parametrized by a *resource bimonoid* <sup>R</sup>, an algebraic structure expressing resources and the effect of their consumption either sequentially or in parallel. Relying on our construction, we give a sound resource-sensitive denotation to R-IPA, an affine higherorder concurrent programming language with shared state and a primitive for resource consumption in R. Compared with general operational semantics parametrized by R, our resource analysis turns out to be finer, leading to non-adequacy. Yet, our model is not degenerate as adequacy holds for an operational semantics specialized to time.

In regard to earlier semantic frameworks for tracking resources, the main novelty of our work is that it is based on a non-interleaving semantics, and as such accounts for *parallel* use of resources accurately.

## **1 Introduction**

Since its inception, *denotational semantics* has grown into a very wide subject. Its developments now cover numerous programming languages or paradigms, using approaches that range from the extensionality of *domain semantics* [24] (recording the input-output behaviour) to the intensionality of *game semantics* [1,17] (recording execution traces, formalized as *plays* in a 2-players game between the program ("Player") and its execution environment ("Opponent")). Denotational semantics has had significant influence on the theory of programming languages, with contributions ranging from program logics or reasoning principles to new language constructs and verification algorithms.

Most denotational models are *qualitative* in nature, meaning that they ignore efficiency of programs in terms of time, or other resources such as power or bandwith. To our knowledge, the first denotational model to cover time was Ghica's *slot games* [13], an extension of Ghica and Murawski's fully abstract model for a higher-order language with concurrency and shared state [14]. Slot games exploit the intensionality of game semantics and represent time via special

Supported by project Elica (ANR-14-CE25-0005) and Labex MiLyon (ANR-10-LABX-0070) of Universit´e de Lyon, within the program "Investissements d'Avenir" (ANR-11- IDEX-0007), operated by the French National Research Agency (ANR).

M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 27–44, 2019. https://doi.org/10.1007/978-3-030-17127-8\_2

moves called *tokens* matching the *ticks* of a clock. They are fully abstract *w.r.t.* the notion of observation in Sands' operational theory of *improvement* [26].

More recently, there has been a growing interest in capturing quantitative aspects denotationally. Laird *et al.* constructed [18] an enrichment of the relational model of Linear Logic [11], using weights from a *resource semiring* given as parameter. This way, they capture in a single framework several notions of resources for extensions of PCF, ranging from time to probabilistic weights. Two type systems with similar parametrizations were introduced simultaneously by, on the one hand, Ghica and Smith [15] and, on the other hand, Brunel, Gaboardi *et al.* [4]; the latter with a quantitative realizability denotational model.

In this paper, we give a resource-sensitive denotational model for R*-IPA*, an affine higher-order programming language with concurrency, shared state, and with a primitive for resource consumption. With respect to slot games our model differs in that our resource analysis accounts for the fact that resource consumption may combine differently in parallel and sequentially – simply put, we mean to express that **wait**(1) **wait**(1) may terminate in 1 s, rather than 2. We also take inspiration from weighted relational models [18] in that our construction is parametrized by an algebraic structure representing resources and their usage. Our *resource bimonoids* R, <sup>0</sup>, ; , -, ≤ differ however significantly from their resource semiring R, <sup>0</sup>, <sup>1</sup>, <sup>+</sup>, ·: while ; matches ·, is a new operation expressing the consumption of resources in parallel. We have no counterpart for the +, which agglomerates distinct non-deterministically co-existing executions leading to the same value: instead our model keeps them separate.

Capturing parallel resource usage is technically challenging, as it can only be attempted relying on a representation of execution where parallelism is explicit. Accordingly, our model belongs to the family of *concurrent* or *asynchronous* game semantics pioneered by Abramsky and Melli`es [2], pushed by Melli`es [20] and later with Mimram [22], and by Faggian and Piccolo [12]; actively developed in the past 10 years prompted by the introduction of a more general framework by Rideau and Winskel [7,25]. In particular, our model is a refinement of the (qualitative) truly concurrent interpretation of *affine IPA* described in [5]. Our methodology to record resource usage is inspired by game semantics for firstorder logic [3,19] where moves carry first-order terms from a signature – instead here they carry explicit *functions*, *i.e.* terms up to a congruence (it is also reminiscent of Melli`es' construction of the free dialogue category over a category [21]).

As in [5] we chose to interpret an affine language: this lets us focus on the key phenomena which are already at play, avoiding the technical hindrance caused by replication. As suggested by recent experience with concurrent games [6,10], we expect the developments presented here to extend transparently in the presence of *symmetry* [8,9]; this would allow us to move to the general (non-affine) setting.

*Outline.* We start Sect. 2 by introducing the language R-IPA. We equip it first with an interleaving semantics and sketch its interpretation in slot games. We then present resource bimonoids, give a new parallel operational semantics, and hint at our truly concurrent games model. In Sect. 3, we construct this model and prove its soundness. Finally in Sect. 4, we show adequacy for an operational semantics specialized to time, noting first that the general parallel operational semantics is too coarse *w.r.t.* our model.

# **2 From** *R***-IPA to** *R***-Strategies**

#### **2.1 Affine IPA**

*Terms and Types.* We start by introducing the basic language under study, *affine Idealized Parallel Algol* (IPA). It is an affine variant of the language studied in [14], a call-by-name concurrent higher-order language with shared state. Its **types** are given by the following grammar:

> A, B ::= **com** <sup>|</sup> **bool** <sup>|</sup> **mem**<sup>W</sup> <sup>|</sup> **mem**<sup>R</sup> <sup>|</sup> <sup>A</sup> -B

Here, **mem**<sup>W</sup> is the type of *writeable* references and **mem**<sup>R</sup> is the type of *readable* references; the distinction is necessary in this affine setting as it allows to share accesses to a given state over subprocesses; this should make more sense in the next paragraph with the typing rules. In the sequel, nonfunctional types are called **ground types** (for which we use notation X). We define terms directly along with their typing rules in Fig. 1. **Contexts** are simply lists <sup>x</sup><sup>1</sup> : <sup>A</sup>1,...,x<sup>n</sup> : <sup>A</sup><sup>n</sup> of variable declarations (in which each variable occurs at most once), and the exchange rule is kept implicit. Weakening is not a rule but is admissible. We comment on a few aspects of these rules.


**Fig. 1.** Typing rules for affine IPA

Firstly, observe that the reference constructor **new** x, y **in** M binds two variables x and y, one with a write permission and the other with a read permission. In this way, the permissions of a shared state can be distributed in different components of *e.g.* an application or a parallel composition, causing interferences despite the affine aspect of the language. Secondly, the assignment command, M := **tt**, seems quite restrictive. Yet, the language is affine, so a variable can only be written to once, and, as we choose to initialize it to **ff**, the only useful thing to write is **tt**. Finally, many rules seem restrictive in that they apply only at ground type X. More general rules can be defined as syntactic sugar; for instance we give (all other constructs extend similarly): M;A-<sup>B</sup> <sup>N</sup> <sup>=</sup> λxA.( <sup>M</sup>;<sup>B</sup> (N x)).

*Operational Semantics.* We fix a countable set L of **memory locations**. Each location comes with two associated variable names <sup>W</sup> and <sup>R</sup> distinct from

other variable names. Usually, stores are partial maps from <sup>L</sup> to {**tt**,**ff**}. Instead, we find it more convenient to introduce the notion of **state** of a memory location. A state corresponds to a history of memory actions (reads or writes) and follows the *state diagram* of Fig. 2 (ignor-

**Fig. 2.** State diagram ing for now the annotations with α, β). We write (M, <sup>≤</sup>M) for the induced set of states and accessibility relation on it. For each m <sup>∈</sup> <sup>M</sup>, its set of **available actions** is act(m) = {W, R} \ m (the letters not occurring in m, annotations being ignored); and its **value** (in {**tt**,**ff**}) is val(m) = **tt** iff W occurs in m.

Finally, a **store** is a partial map s : <sup>L</sup> <sup>→</sup> <sup>M</sup> with finite domain, mapping each memory location to its current state. To each store corresponds a *typing context*

<sup>Ω</sup>(s) = {<sup>X</sup> : **mem**<sup>X</sup> <sup>|</sup> <sup>∈</sup> dom(s) & <sup>X</sup> <sup>∈</sup> act(s())}.

The operational semantics operates on **configurations** defined as pairs M,s with s a store and Γ M : A a term whose free variables are all memory locations with Γ <sup>⊆</sup> Ω(s). This property will be preserved by our rather standard small-step, call-by-name operational semantics. We refrain for now from giving the details, they will appear in Sect. 2.2 in the presence of resources.

# **2.2 Interleaving Cost Semantics, and** *R***-IPA**

Ghica and Murawski [14] have constructed a *fully abstract*(for may-equivalence) model for (non-affine) IPA, relying on an extension of Hyland-Ong games [17].

Their model takes an *interleaving* view of the execution of concurrent

programs: a program is represented by the set of all its possible executions, as decided nondeterministically by the scheduler. In game semantics, this is captured by lifting the standard requirement that the two players alternate. For instance, Fig. 3 shows a *play* in the interpretation of the program x : **com**, y : **bool** x y : **bool**. The diagram is read from top to bottom, chronologically. Each line

**Fig. 3.** A non-alternating play

comprises one computational event ("move"), annotated with "−" if due to the execution environment ("Opponent") and with "+" if due to the program ("Player"); each move corresponds to a certain type component, under which it is placed. With the first move **q**−, the environment initiates the computation. Player then plays **run**<sup>+</sup>, triggering the evaluation of x. In standard game semantics, the control would then go back to the execution environment – Player would be stuck until Opponent plays. Here instead, due to parallelism Player can play a second move **<sup>q</sup>**<sup>+</sup> immediately. At this point of execution, <sup>x</sup> and <sup>y</sup> are both running in parallel. Only when they have both returned (moves **done**<sup>−</sup> and **tt**−) is Player able to respond **tt**<sup>+</sup>, terminating the computation. The full interpretation of x : **com**, y : **bool** x y : **bool**, its *strategy*, comprises numerous plays like that, one for each interleaving.

As often in denotational semantics, Ghica and Murawski's model is invariant under reduction: if M,s→M , s , both have the same denotation. The model adequately describes the result of computation, but not its *cost* in terms, for instance, of time. Of course this cost is not yet specified: one must, for instance, define a *cost model* assigning a cost to all basic operations (*e.g.* memory operations, function calls, *etc*). In this paper we instead enrich the language with a primitive for *resource consumption* – cost models can then be captured by inserting this primitive concomitantly with the costly operations (see for example [18]).

R*-IPA.* Consider a set R of **resources**. The language R-IPA is obtained by adding to affine IPA a new construction, **consume**(α), typed as in Fig. 4. When evaluated, **consume**(α) triggers the consumption of resource R. Time consumption will be a run-

$$\frac{(\alpha \in \mathcal{R})}{\varGamma \vdash \text{consume}(\alpha) : \text{com}}$$

```
Fig. 4. Typing consume
```
ning example throughout the paper. In that case, we will consider the nonnegative reals <sup>R</sup><sup>+</sup> as set <sup>R</sup>, and for <sup>t</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> we will use **wait**(t) as a synonym for **consume**(t).

**skip**; M, s, α→M, s, α **skip** M, s, α→M, s, α <sup>M</sup> **skip**, s, α→M, s, α **if tt** <sup>N</sup><sup>1</sup> <sup>N</sup>2, s, α→N1, s, α **if ff** <sup>N</sup><sup>1</sup> <sup>N</sup>2, s, α→N2, s, α (λx. M) N, s, α→M[N/x], s, α !R, s, α→val(s()), s[ <sup>→</sup> <sup>s</sup>().R<sup>α</sup>], α <sup>W</sup> := **tt**, s, α→**skip**, s[ <sup>→</sup> <sup>s</sup>().W<sup>α</sup>], α **new** x, y **in** M, s, α→M[<sup>W</sup> /x, R/y], s { → ε}, α **consume**(β), s, α→**skip**, s, α; <sup>β</sup>

**Fig. 5.** Operational semantics: basic rules

To equip R-IPA with an operational semantics we need operations on R, they are introduced throughout this section. First we have 0 ∈ R, the null resource; if α, β ∈ R, we have some α; β ∈ R, the resource taken by consuming α, then β – for <sup>R</sup> <sup>=</sup> <sup>R</sup>+, this is simply addition. To evaluate <sup>R</sup>-IPA, the **configurations** are now triples M, s, α with α ∈ R tracking resources already spent. With that, we give in Fig. 5 the basic operational rules. The only rule affecting current resources is that for **consume**(β), the others leave it unchanged. However note that we store the current state of resources when performing memory operations, explaining the annotations in Fig. 2. These annotations do not impact the operational behaviour, but will be helpful in relating with the game semantics in Sect. 3. As usual, these rules apply within call-by-name evaluation contexts – we omit the details here but they will appear for our final operational semantics.

*Slot Games.* In [13], Ghica extends Ghica and Murawski's model to *slot games* in order to capture resource consumption. Slot games introduce a new action called a *token*, representing an atomic resource consumption, and written \$ – writing <sup>n</sup> for <sup>n</sup> successive occurrences of \$ . A model of <sup>N</sup>+-IPA using slot games would have for instance the play in Fig. 6 in the interpretation of

$$H = (\mathbf{wait}(1); x; \mathbf{wait}(2)) \parallel (\mathbf{wait}(2); y; \mathbf{wait}(1))$$

in context x : **com**, y : **bool**, among with many others. Note, in examples, we use a more liberal typing rule for ';' allowing <sup>y</sup>**bool**; z**com** : **bool** to avoid clutter: it can be encoded as **if** y (z; **tt**) (z; **ff**). Following the methodology of game semantics, the interpretation of (λxy. H) **skip tt** would yield, by composition, the x : **com**, y : **bool bool q**<sup>−</sup> \$ **run**<sup>+</sup> 2 **q**<sup>+</sup> **tt**<sup>−</sup> \$ **done**<sup>−</sup> 2 **tt**<sup>+</sup> **Fig. 6.** A play with tokens strategy with only maximal play **q**<sup>−</sup> 6 **tt**<sup>+</sup>, where 6 reflects the overall 6 time units (say "seconds") that have to pass in total before we see the result (3 in each thread). This seems wasteful, but it is indeed an adequate computational analysis, because both slot games and the operational semantics given so far implicitly assume a sequential operational model, *i.e.* that both threads compete to be scheduled on a *single* processor. Let us now question that assumption.

*Parallel Resource Consumption.* With a truly concurrent evaluation in mind, we should be able to prove that the program above may terminate in 3 s, rather than 6; as nothing prevents the threads from evaluating in parallel. Before we update the operational semantics to express that, we enrich our resource structure to allow it to express the effect of consuming resources in parallel.

We now introduce the full algebraic structure we require for resources.

**Definition 1.** *<sup>A</sup> resource bimonoid is* R, <sup>0</sup>, ; , -, ≤ *where* R, <sup>0</sup>, ; , ≤ *is an ordered monoid,* R, <sup>0</sup>, -, ≤ *is an ordered commutative monoid,* <sup>0</sup> *is bottom for* ≤*, and is idempotent,* i.e. *it satisfies* α α <sup>=</sup> α*.*

A resource bimonoid is in particular a *concurrent monoid* in the sense of *e.g.* [16] (though we take <sup>≤</sup> in the opposite direction: we read <sup>α</sup> <sup>≤</sup><sup>R</sup> <sup>α</sup> as "<sup>α</sup> is *better*/*more efficient* than α "). Our *Idempotence* assumption is rather strong as it entails that α β is the supremum of α, β ∈ R. This allows to recover a number of simple laws, *e.g.* α β <sup>≤</sup> α; β, or the exchange rule (α; β) - (α ; β ) <sup>≤</sup> (α α ); (β β ). Idempotence, which would not be needed for a purely functional language, is used crucially in our interpretation of state.

Our leading examples are N<sup>+</sup>, <sup>0</sup>, <sup>+</sup>, max, ≤ and R<sup>+</sup>, <sup>0</sup>, <sup>+</sup>, max, ≤ – we call the latter the *time bimonoid*. Others are the *permission bimonoid* P(P), <sup>∅</sup>,∪,∪, ⊆ for some set P of *permissions*: if reaching a state requires certain permissions, it does not matter whether these have been requested sequentially or in parallel; the bimonoid of *parametrized time* M, <sup>0</sup>, ; , -, ≤ with <sup>M</sup> the monotone functions from positive reals to positive reals, 0 the constant function, the pointwise maximum, and (f; g)(x) = f(x) + g(x <sup>+</sup> f(x)): it tracks time consumption in a context where the time taken by **consume**(α) might grow over time.

Besides time-based bimonoids, it would be appealing to cover resources such as *power*, *bandwith* or *heapspace*. Those, however, clearly fail idempotence of -, and are therefore not covered. It is not clear how to extend our model to those.

$$\begin{array}{llll} \langle M,s,\alpha\rangle \Rightarrow \langle M,s,\alpha\rangle & \frac{\langle M,s,\alpha\rangle \rightarrow \langle M',s',\alpha'\rangle}{\langle M,s,\alpha\rangle \Rightarrow \langle M',s',\alpha'\rangle} & \frac{\langle M,s,\alpha\rangle \Rightarrow \langle M',s',\alpha'\rangle}{\langle C[M],s,\alpha\rangle \Rightarrow \langle C[M'],s',\alpha'\rangle} \\\\ \frac{\langle M,s,\alpha\rangle \Rightarrow \langle M',s',\alpha'\rangle}{\langle M,s,\alpha\rangle \Rightarrow \langle M',s',\alpha''\rangle} & \frac{\langle M,s,\alpha\rangle \Rightarrow \langle M',s',\alpha'\rangle}{\langle M',s',\alpha'\rangle \Rightarrow \langle M',\alpha'\rangle \Rightarrow \langle M',\alpha'\rangle \Rightarrow \langle M',\alpha'\rangle} & \frac{\langle M,s,\alpha\rangle \Rightarrow \langle M',s',\alpha'\rangle}{\langle M,s,\alpha\rangle \Rightarrow \langle M',\alpha'\rangle \Rightarrow \langle M',\alpha'\rangle} \end{array}$$

**Fig. 7.** Rules for parallel reduction

*Parallel Operational Semantics.* Let us fix a resource bimonoid R. To express parallel resource consumption, we use the many-step *parallel reductions* defined in Fig. 7, with **call-by-name evaluation contexts** given by

<sup>C</sup>[] ::= [] <sup>|</sup> [] <sup>N</sup> <sup>|</sup> []; <sup>N</sup> <sup>|</sup> **if** [] <sup>N</sup><sup>1</sup> <sup>N</sup><sup>2</sup> <sup>|</sup> [] := **tt** <sup>|</sup> ![] <sup>|</sup> ([] - N) <sup>|</sup> (M -[])

The rule for parallel composition carries some restrictions regarding memory: M and N can only reduce concurrently if they do not access the same memory cells. This is achieved by requiring that the *partial* operation <sup>s</sup> <sup>↑</sup> <sup>s</sup> – that intuitively corresponds to "merging" two memory stores s and s whenever there are no conflicts – is defined. More formally, the partial order ≤<sup>M</sup> on memory states induces a partial order (also written <sup>≤</sup>M) on stores, defined by <sup>s</sup> <sup>≤</sup><sup>M</sup> <sup>s</sup> iff dom(s) <sup>⊆</sup> dom(s ) and for all <sup>∈</sup> dom(s) we have <sup>s</sup>() <sup>≤</sup><sup>M</sup> <sup>s</sup> (). This order is a cpo in which s and <sup>s</sup> are *compatible* (*i.e.* have an upper bound) iff for all <sup>∈</sup> dom(s ) <sup>∩</sup> dom(s), s () <sup>≤</sup><sup>M</sup> <sup>s</sup>() or <sup>s</sup>() <sup>≤</sup><sup>M</sup> <sup>s</sup> () – so there has been no interference going to s and <sup>s</sup> from their last common ancestor. When compatible, s <sup>↑</sup> <sup>s</sup> maps <sup>s</sup> and <sup>s</sup> to their lub, and is undefined otherwise.

For <sup>M</sup> : **com**, we set <sup>M</sup> ⇓<sup>α</sup> if M, <sup>∅</sup>, <sup>0</sup> <sup>⇒</sup> **skip**, s, α. For instance, instantiating the rules with the time bimonoid, we have

$$(\mathtt{wait}(1); \mathtt{wait}(2)) \parallel (\mathtt{wait}(2); \mathtt{wait}(1)) \downarrow$$

#### **2.3 Non-interleaving Semantics**

**bool**

To capture this parallel resource usage semantically, we build on the games model for affine IPA presented in [5]. Rather than presenting programs as collections of *sequences* of moves expressing all observable sequences of computational actions, this model adopts a *truly concurrent* view using collections of *partially ordered* plays. For each Player move, the order specifies its *causal dependencies*, *i.e.* the Opponent moves that need to have happened before. For instance, ignoring the

x : **com**, y : **bool** -

subscripts, Fig. 8 displays a typical partially ordered play in the strategy for the term H of Sect. 2.2. One partially ordered play does not fully specify a sequential execution: that in Fig. 8 stands for *many* sequential executions, one of which is in Fig. 3. Behaviours expressed by partially ordered plays are deterministic *up to* choices of the scheduler irrelevant for the eventual result. Because R-IPA is nondeterministic (via concurrency and shared state), our strategies will be *sets* of such partial orders.

**Fig. 8.** A parallel <sup>R</sup>-play

To express resources, we leverage the causal information and indicate, in each partially ordered play and for each positive move, an R-expression representing its *additional cost* in function of the cost of its negative dependencies. Figure 8 displays such a R*-play*: each Opponent move introduces a fresh variable, which can be used in annotations for Player moves. As we will see further on, once applied to strategies for values **skip** and **tt** (with no additional cost), this Rplay will answer to the initial Opponent move **q**<sup>−</sup> <sup>x</sup> with **tt**<sup>+</sup> <sup>x</sup>; <sup>α</sup> where <sup>α</sup> = (1; 2) - (2; 1) =<sup>R</sup><sup>+</sup> 3, as prescribed by the more efficient parallel operational semantics.

We now go on to define formally our semantics.

#### **3 Concurrent Game Semantics of IPA**

### **3.1 Arenas and** *R***-Strategies**

*Arenas.* We first introduce *arenas*, the semantic representation of types in our model. As in [5], an arena will be a certain kind of *event structure* [27].

**Definition 2.** *An event structure comprises* (E, <sup>≤</sup><sup>E</sup>, #E) *where* E *is a set of* events*,* ≤<sup>E</sup> *is a partial order called* causal dependency*, and* #<sup>E</sup> *is an irreflexive symmetric binary relation called* conflict*, subject to the two axioms:*

$$\begin{array}{l} \forall e \in E, [e]\_E = \{e' \in E \mid e' \leq\_E e\} \ is \ finite \\ \forall e\_1 \ \#\_E e\_2, \forall e\_1 \leq\_E e'\_1, e'\_1 \ \#\_E e\_2 \end{array}$$

We will use some vocabulary and notations from event structures. A **configuration** <sup>x</sup> <sup>⊆</sup> <sup>E</sup> is a down-closed, consistent (*i.e.* for all e, e <sup>∈</sup> <sup>x</sup>, <sup>¬</sup>(<sup>e</sup> #<sup>E</sup> <sup>e</sup> )) finite set of events. We write *<sup>C</sup>* (E) for the set of configurations of E. We write -<sup>E</sup> for **immediate causality**, *i.e.* e -<sup>E</sup> <sup>e</sup> iff e <<sup>E</sup> <sup>e</sup> with nothing in between – this is the relation represented in diagrams such as Fig. 8. A conflict <sup>e</sup><sup>1</sup> #<sup>E</sup> <sup>e</sup><sup>2</sup> is **minimal** if for all e <sup>1</sup> <sup>&</sup>lt;<sup>E</sup> <sup>e</sup><sup>1</sup>, <sup>¬</sup>(e <sup>1</sup> #<sup>E</sup> <sup>e</sup><sup>2</sup>) and symmetrically. We write <sup>e</sup><sup>1</sup> <sup>∼</sup><sup>E</sup> <sup>e</sup><sup>2</sup> to indicate that <sup>e</sup><sup>1</sup> and <sup>e</sup><sup>2</sup> are in minimal conflict.

With this, we now define arenas.

**Definition 3.** *An arena is* (A, <sup>≤</sup>A, #A, polA)*, an event structure along with a polarity function* pol<sup>A</sup> : <sup>A</sup> −→ {−, <sup>+</sup>} *subject to:* (1) <sup>≤</sup><sup>A</sup> *is forest-shaped,* (2) -<sup>A</sup> *is alternating: if* <sup>a</sup><sup>1</sup> -<sup>A</sup> <sup>a</sup><sup>2</sup>*, then* polA(a<sup>1</sup>) = polA(a<sup>2</sup>)*, and* (3) *it is race-free,* i.e. *if* <sup>a</sup><sup>1</sup> <sup>∼</sup><sup>A</sup> <sup>a</sup><sup>2</sup>*, then* polA(a<sup>1</sup>) = polA(a<sup>2</sup>)*.*

Arenas present the computational actions available on a type, following a call-by-name evaluation strategy. For instance, the observable actions of a closed

**Fig. 9.** An arena for a sequent

term on **com** are that it can be ran, and it may terminate, leading to the arena **com** <sup>=</sup> **run**<sup>−</sup> - **done**<sup>+</sup>. Likewise, a boolean can be evaluated, and can terminate on **tt** or **ff**, yielding the arena on the right of Fig. 9 (when drawing arenas, immediate causality is written with a dotted line, from top to bottom). We present some simple arena constructions. The **empty arena**, written 1, has no events. If A is an arena, then its

**dual** A<sup>⊥</sup> has the same components, but polarity reversed. The **parallel composition** of A and B, written A - B, has as events the tagged disjoint union {1} × <sup>A</sup> ∪ {2} × <sup>B</sup>, and all other components inherited. For <sup>x</sup><sup>A</sup> <sup>∈</sup> *<sup>C</sup>* (A) and <sup>x</sup><sup>B</sup> <sup>∈</sup> *<sup>C</sup>* (B), we also write <sup>x</sup><sup>A</sup> <sup>x</sup><sup>B</sup> <sup>∈</sup> *<sup>C</sup>* (<sup>A</sup> - B). Figure <sup>9</sup> displays the arena **com**<sup>⊥</sup> **bool**<sup>⊥</sup> **bool**.

R*-Augmentations.* As hinted before, R-strategies will be collections of partially ordered plays with resource annotations in R, called R*-augmentations*.

**Definition 4.** *An augmentation [5] on arena* A *is a finite partial order* <sup>q</sup> <sup>=</sup> (|q|, <sup>≤</sup>q) *such that <sup>C</sup>* (q) <sup>⊆</sup> *<sup>C</sup>* (A) *(concerning configurations, augmentations are considered as event structures with empty conflict), which is courteous, in the sense that for all* <sup>a</sup><sup>1</sup> <sup>q</sup> <sup>a</sup><sup>2</sup>*, if* polA(a<sup>1</sup>)=+ *or* polA(a<sup>2</sup>) = <sup>−</sup>*, then* <sup>a</sup><sup>1</sup> -<sup>A</sup> <sup>a</sup><sup>2</sup>*.*

*<sup>A</sup>* <sup>R</sup>*-augmentation also has (with* [a] − <sup>q</sup> <sup>=</sup> {a <sup>≤</sup><sup>q</sup> <sup>a</sup> <sup>|</sup> polA(a ) = −}*)*

$$\lambda\_{\mathbf{q}} : (a \in |\mathbf{q}|) \quad \begin{array}{c} \longrightarrow \quad \left(\mathcal{R}^{[a]^{-}\_{\mathbf{q}}} \to \mathcal{R}\right), \end{array}$$

*such that if* polA(a) = <sup>−</sup>*, then* <sup>λ</sup><sup>q</sup>(a)(ρ) = <sup>ρ</sup><sup>a</sup>*, the projection on* <sup>a</sup> *of* <sup>ρ</sup> ∈ R[a] − q *, and for all* a ∈ |q|*,* λ<sup>q</sup>(a) *is monotone* w.r.t. *all of its variables.*

*We write* <sup>R</sup>*-*Aug(A) *for the set of* <sup>R</sup>*-augmentations on* A*.*

If <sup>q</sup>, <sup>q</sup> ∈ R-Aug(A), <sup>q</sup> is **rigidly embedded** in <sup>q</sup> , or a **prefix** of q , written <sup>q</sup> <sup>→</sup> <sup>q</sup> , if |q| ∈ *C* (q ), for all a, a ∈ |q|, <sup>a</sup> <sup>≤</sup><sup>q</sup> <sup>a</sup> iff <sup>a</sup> <sup>≤</sup><sup>q</sup>- a , and for all a ∈ |q|, λ<sup>q</sup>(a) = <sup>λ</sup><sup>q</sup>- (a). The <sup>R</sup>*-plays* of Sect. 2.3 are formalized as <sup>R</sup>augmentations: Fig. 8 presents an R-augmentation on the arena of Fig. 9. The functional dependency in the annotation of positive events is represented by using the free variables introduced alongside negative events, however this is only a symbolic representation: the formal annotation is a function for each positive event. In the model of R-IPA, we will only use the particular case where the annotations of positive events only depend on the annotations of their immediate predecessors.

R*-Strategies.* We start by defining R-strategies on arenas.

**Definition 5.** *<sup>A</sup>* <sup>R</sup>*-strategy on* A *is a non-empty prefix-closed set of* <sup>R</sup>*-augmentations* σ ⊆ R*-*Aug(A) *which is receptive [5]: for* <sup>q</sup> <sup>∈</sup> σ *such that* <sup>|</sup>q<sup>|</sup> *extends with* <sup>a</sup><sup>−</sup> <sup>∈</sup> A *(* i.e. pol(a) = <sup>−</sup>*,* a ∈ |q|*, and* <sup>|</sup>q|∪{a} ∈ *<sup>C</sup>* (A)*), there is* <sup>q</sup> <sup>→</sup> <sup>q</sup> <sup>∈</sup> <sup>σ</sup> *such that* <sup>|</sup>q <sup>|</sup> <sup>=</sup> <sup>|</sup>q|∪{a}*.*

*If* σ *is a* <sup>R</sup>*-strategy on arena* A*, we write* σ : A*.*

Observe that R-strategies are fully described by their *maximal* augmentations, *i.e.* augmentations that are the prefix of no other augmentations in the strategy. Our interpretation of new will use the <sup>R</sup>-strategy cell : **mem**<sup>W</sup> - **mem**R (with arenas presented in Fig. 10), comprising all the <sup>R</sup>-augmentations rigidly included in either of the two from Fig. 11. These two match the race when reading and writing simultaneously: if both **wtt**<sup>−</sup> and **r**<sup>−</sup> are played the read may return **tt**<sup>+</sup> or **ff** <sup>+</sup>, but it can only return **tt**<sup>+</sup> in the presence of **wtt**−.

**Fig. 10. mem***<sup>W</sup>* and **mem***<sup>R</sup>*

**Fig. 11.** Maximal <sup>R</sup>-augmentations of cell

# **3.2 Interpretation of** *R***-IPA**

*Categorical Structure.* In order to define the interpretation of terms of R-IPA as R-strategies, a key step is to show how to form a *category* of R-strategies. To do that we follow the standard idea of considering <sup>R</sup>**-strategies from** A **to** B to be simply <sup>R</sup>-strategies on the compound arena A<sup>⊥</sup> - B. As usual, our first example of a R-strategy between arenas is the *copycat* R*-strategy*.

**Definition 6.** *Let* <sup>A</sup> *be an arena. We define a partial order* <sup>≤</sup>CC<sup>A</sup> *on* <sup>A</sup><sup>⊥</sup> -A*:*

$$\leq\_{\mathfrak{C}\_A} = \left( \{ ((1,a),(1,a')) \mid a \leq\_A a' \} \cup \{ ((2,a),(2,a')) \mid a \leq\_A a' \} \cup \{ ((1,a),(2,a)) \mid a \leq\_A a' \} \cup \{ ((1,a),(1,a)) \mid \text{pol}\_A(a) = - \} \right)^+$$

*where* (−)<sup>+</sup> *denotes the transitive closure of a relation. Note that if* a <sup>∈</sup> A<sup>⊥</sup> - A *is positive, it has a unique immediate predecessor* pred(a) <sup>∈</sup> A<sup>⊥</sup> -<sup>A</sup> *for* <sup>≤</sup>CC<sup>A</sup> *.*

*If* x y <sup>∈</sup> *<sup>C</sup>* (A<sup>⊥</sup> - <sup>A</sup>) *is down-closed for* <sup>≤</sup>CC<sup>A</sup> *(write* <sup>≤</sup>x,y *for the restriction of* <sup>≤</sup>CC<sup>A</sup> *to* <sup>x</sup> <sup>y</sup>*), we define an* <sup>R</sup>*-augmentation* <sup>q</sup>x,y = (<sup>x</sup> y, <sup>≤</sup>x,y, λx,y) *where*

$$\lambda\_{x,y} : (a \in x \parallel y) \quad \longrightarrow \quad \left(\mathcal{R}^{[a]^{-}\_{x\parallel y}} \to \mathcal{R}\right).$$

*with* <sup>λ</sup>x,y(a−)(ρ) = <sup>ρ</sup>a*, and* <sup>λ</sup>x,y(a<sup>+</sup>)(ρ) = <sup>ρ</sup>pred(a)*. Then,* cc<sup>A</sup> *is the* <sup>R</sup>*-strategy comprising all* <sup>q</sup>x,y *for* <sup>x</sup> y <sup>∈</sup> *<sup>C</sup>* (A<sup>⊥</sup> -A) *down-closed in* A*.*

We first define *interactions* of R-augmentations, extending [5].

**Definition 7.** *We say that* <sup>q</sup> ∈ R*-*Aug(A<sup>⊥</sup> - B)*, and* <sup>p</sup> ∈ R*-*Aug(B<sup>⊥</sup> - C) *are causally compatible if* <sup>|</sup>q<sup>|</sup> <sup>=</sup> <sup>x</sup><sup>A</sup> <sup>x</sup><sup>B</sup>*,* <sup>|</sup>p<sup>|</sup> <sup>=</sup> <sup>x</sup><sup>B</sup> <sup>x</sup><sup>C</sup> *, and the preorder* <sup>≤</sup><sup>p</sup><sup>q</sup> *on* <sup>x</sup><sup>A</sup> <sup>x</sup><sup>C</sup> *defined as* (≤<sup>q</sup> ∪ ≤p) <sup>+</sup> *is a partial order.*

 <sup>x</sup><sup>B</sup> -*Say* <sup>e</sup> <sup>∈</sup> <sup>x</sup><sup>A</sup> <sup>x</sup><sup>B</sup> <sup>x</sup><sup>C</sup> *is* negative *if it is negative in* <sup>A</sup><sup>⊥</sup> -C*. We define*

$$\lambda\_{\mathsf{p}\circledast \mathsf{q}} : (e \in x\_A \parallel x\_B \parallel x\_C) \quad \longrightarrow \quad \left(\mathcal{R}^{[e]\_{\mathsf{p}\circledast \mathsf{q}}} \to \mathcal{R}\right).$$

*as follows, by well-founded induction on* <<sup>p</sup><sup>q</sup>*, for* <sup>ρ</sup> ∈ R[e] − pq *:*

$$
\lambda\_{\mathtt{p}\oplus\mathtt{q}}(e)(\rho) = \begin{cases}
\lambda\_{\mathtt{p}}(e)\left(\langle\lambda\_{\mathtt{p}\oplus\mathtt{q}}(e')(\rho)\mid e' \in [e]\_{\mathtt{p}}^{-}\right) & if \text{ pol}\_{B^{\perp}\parallel C}(e) = +, \\
\lambda\_{\mathtt{q}}(e)\left(\langle\lambda\_{\mathtt{p}\oplus\mathtt{q}}(e')(\rho)\mid e' \in [e]\_{\mathtt{q}}^{-}\right) & if \text{ pol}\_{A^{\perp}\parallel B}(e) = +, \\
\rho\_{e} & otherwise, \text{ } i.e. \text{ } e \text{ negative}
\end{cases}
$$

*The interaction* <sup>p</sup> <sup>q</sup> *of compatible* <sup>q</sup>, <sup>p</sup> *is* (x<sup>A</sup> <sup>x</sup><sup>B</sup> <sup>x</sup><sup>C</sup> , <sup>≤</sup><sup>p</sup><sup>q</sup>, λ<sup>p</sup><sup>q</sup>)*.*

If σ : A<sup>⊥</sup> - B and τ : B<sup>⊥</sup> - C, we write τ σ for the set comprising all <sup>p</sup> <sup>q</sup> such that <sup>p</sup> <sup>∈</sup> τ and <sup>q</sup> <sup>∈</sup> σ are causally compatible. For <sup>q</sup> <sup>∈</sup> σ and <sup>p</sup> <sup>∈</sup> <sup>τ</sup> causally compatible with <sup>|</sup><sup>p</sup> <sup>q</sup><sup>|</sup> <sup>=</sup> <sup>x</sup><sup>A</sup> <sup>x</sup><sup>B</sup> <sup>x</sup><sup>C</sup> , their **composition** is <sup>p</sup> <sup>q</sup> = (x<sup>A</sup> <sup>x</sup><sup>C</sup> , <sup>≤</sup><sup>p</sup><sup>q</sup>, λ<sup>p</sup><sup>q</sup>) where <sup>≤</sup><sup>p</sup><sup>q</sup> and <sup>λ</sup><sup>p</sup><sup>q</sup> are the restrictions of <sup>≤</sup><sup>p</sup><sup>q</sup> and <sup>λ</sup><sup>p</sup><sup>q</sup>. Finally, the **composition** of <sup>σ</sup> : <sup>A</sup><sup>⊥</sup> - B and τ : B<sup>⊥</sup> - C is the set comprising all <sup>p</sup> <sup>q</sup> for <sup>q</sup> <sup>∈</sup> σ and <sup>p</sup> <sup>∈</sup> τ causally compatible.

**Fig. 12.** Example of interaction and composition between <sup>R</sup>+-augmentations

In Fig. 12, we display an example composition between R+-augmentations – with also in gray the underlying interaction. The reader may check that the variant of the left R+-augmentation with **tt** replaced with **ff** is causally compatible with the other augmentation in Fig. 11, with composition **q**<sup>−</sup> <sup>x</sup> **ff** <sup>+</sup> x; 4.

We also have a tensor operation: on arenas, A <sup>⊗</sup> B is simply a synonym for A - <sup>B</sup>. If <sup>q</sup><sup>1</sup> ∈ R-Aug(A<sup>⊥</sup> <sup>1</sup> - <sup>B</sup><sup>1</sup>) and <sup>q</sup><sup>2</sup> ∈ R-Aug(A<sup>⊥</sup> <sup>2</sup> - B<sup>2</sup>), their **tensor product** <sup>q</sup><sup>1</sup> <sup>⊗</sup> <sup>q</sup><sup>2</sup> ∈ R-Aug((A<sup>1</sup> <sup>⊗</sup> <sup>A</sup><sup>2</sup>)<sup>⊥</sup> - (B<sup>1</sup> <sup>⊗</sup> <sup>B</sup><sup>2</sup>)) is defined in the obvious way. This is lifted to R-strategies element-wise. As is common when constructing basic categories of games and strategies, we have:

**Proposition 1.** *There is a compact closed category* R*-*Strat *having arenas as objects, and as morphisms,* R*-strategies between them.*

*Negative Arenas and* R*-Strategies.* As a compact closed category, R-Strat is a model of the linear λ-calculus. However, we will (as usual for call-by-name) instead interpret R-IPA in a sub-category of *negative* arenas and strategies, in which the empty arena 1 is terminal, providing the interpretation of weakening. We will stay very brief here, as this proceeds exactly as in [5].

A partial order with polarities is **negative** if all its minimal events are. This applies in particular to arenas, and R-augmentations. A R-strategy is **negative** if all its <sup>R</sup>-augmentations are. A negative <sup>R</sup>-augmentation <sup>q</sup> ∈ R-Aug(A) is **well-threaded** if for all <sup>a</sup> ∈ |q|, [a]<sup>q</sup> has exactly one minimal event; a <sup>R</sup>strategy is **well-threaded** iff all its R-augmentations are. We have:

**Proposition 2.** Negative arenas *and* negative well-threaded R*-strategies form a cartesian symmetric monoidal closed category* R*-*Strat−*, with* 1 *terminal. We also write* σ : A <sup>+</sup> - B *for morphisms in* <sup>R</sup>*-*Strat−*.*

The closure of <sup>R</sup>-Strat does not transport to <sup>R</sup>-Strat<sup>−</sup> as <sup>A</sup><sup>⊥</sup> - B is never negative if A is non-empty, thus we replace it with a negative version. Here we describe only a restricted case of the general construction in [5], which is however sufficient for the types of <sup>R</sup>-IPA. If A, B are negative arenas and B is **wellopened**, *i.e.* it has exactly one minimal event b, we form A - B as having all components as in A<sup>⊥</sup> -B, with additional dependencies {((2, b),(1, a)) <sup>|</sup> a <sup>∈</sup> A}.

**Fig. 13.** Maximal <sup>R</sup>-augmentations of <sup>R</sup>-strategies used in the interpretation

Using the compact closed structure of R-Strat it is easy to build a copycat Rstrategy evA,B : (<sup>A</sup> - B)⊗A <sup>+</sup> - B, and to associate to any σ : C⊗A <sup>+</sup> - B some Λ(σ) : C <sup>+</sup> - A - B providing the monoidal closure. The cartesian product of A and B is A & B with components the same as A - B, except for (1, a) # (2, b) for all <sup>a</sup> <sup>∈</sup> A, b <sup>∈</sup> <sup>B</sup>. We write <sup>π</sup><sup>i</sup> : <sup>A</sup><sup>1</sup> & <sup>A</sup><sup>2</sup> <sup>+</sup> - <sup>A</sup><sup>i</sup> for the projections, and σ, τ : A <sup>+</sup> - B & C for the pairing of σ : A <sup>+</sup> - B, and τ : A <sup>+</sup> - C.

*Interpretation of* <sup>R</sup>*-IPA.* We set **com** <sup>=</sup> **run**<sup>−</sup> **done**<sup>+</sup>, **bool** as in the right-hand side of Fig. 9, **mem**<sup>W</sup> and **mem**R as in Fig. 10, and A - B <sup>=</sup> A - B as expected. Contexts <sup>Γ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> : <sup>A</sup><sup>1</sup>,...,x<sup>n</sup> : <sup>A</sup><sup>n</sup> are interpreted as Γ <sup>=</sup> <sup>⊗</sup><sup>1</sup>≤i≤nAi. Terms <sup>Γ</sup> <sup>M</sup> : A are interpreted as t : Γ <sup>+</sup> -A as follows: ⊥ is the diverging <sup>R</sup>strategy (no player move), **consume**(α) has only maximal <sup>R</sup>-augmentation

<sup>M</sup>; <sup>N</sup> : <sup>X</sup> <sup>=</sup> seq<sup>X</sup> (M <sup>⊗</sup> N) M - <sup>N</sup> : <sup>X</sup> <sup>=</sup> par<sup>X</sup> (M <sup>⊗</sup> N) **if** M N<sup>1</sup> <sup>N</sup><sup>2</sup> : <sup>X</sup> <sup>=</sup> if<sup>X</sup> (M ⊗ N1, N2) !<sup>M</sup> : **bool** <sup>=</sup> deref M M := **tt** : **com** <sup>=</sup> assign M **new** x, y **in** <sup>M</sup> : <sup>X</sup> <sup>=</sup> M (Γ <sup>⊗</sup> cell)

**run**− <sup>x</sup> **done**<sup>+</sup> <sup>x</sup>;α, **skip** is **consume**(0), and **tt** and **ff** are interpreted similarly with the adequate constant R-strategies. The rest of the interpretation is given on the left, using the two obvious isos deref : **mem**R <sup>+</sup> **bool** and assign : **mem**<sup>W</sup> <sup>+</sup> **com**;

the R-strategy cell introduced in Fig. 11; and additional R-strategies with typical <sup>R</sup>-augmentations in Fig. 13. We omit the (standard) clauses for the λ-calculus.

#### **3.3 Soundness**

Now that we have defined the game semantics of R-IPA, we set to prove that it is sound with respect to the operational semantics given in Sect. 2.2.

We first introduce a useful notation. For any type A, A has a unique minimal event; write A for the arena without this minimal event. Likewise, if Γ M : A, then by construction, M : Γ<sup>⊥</sup> - A is a negative <sup>R</sup>-strategy whose augmentations all share the same minimal event **q**<sup>−</sup> <sup>x</sup> where **q**<sup>−</sup> is minimal in <sup>A</sup>. For <sup>α</sup> ∈ R, write M<sup>α</sup> for M without **<sup>q</sup>**<sup>−</sup> <sup>x</sup> , with <sup>x</sup> replaced by <sup>α</sup>. Then we have M<sup>α</sup> : Γ<sup>⊥</sup> - A – one may think of M<sup>α</sup> as "<sup>M</sup> started with consumed resource α".

Naively, one may expect soundness to state that for all M : **com**, if M ⇓α, then M<sup>0</sup> <sup>=</sup> **done**<sup>+</sup> <sup>α</sup> . However, whereas the resource annotations in the semantics are always as good as permitted by the causal constraints, derivations in the operational semantics may be sub-optimal. For instance, we may derive <sup>M</sup> ⇓<sup>α</sup> not using the parallel rule at all. So our statement is:

#### **Theorem 1.** *If* <sup>M</sup> : **com** *with* <sup>M</sup> ⇓α*, there is* <sup>β</sup> <sup>≤</sup><sup>R</sup> <sup>α</sup> s.t. M<sup>0</sup> <sup>=</sup> **done**<sup>+</sup> β *.*

Our proof methodology is standard: we replay operational derivations as augmentations in the denotational semantics. Stating the invariant successfully proved by induction on operational derivations requires some technology.

If <sup>s</sup> is a store, then write cell<sup>s</sup> : Ω(s) for the memory strategy for store <sup>s</sup>. It is defined as ⊗∈dom(s)cells() where cell<sup>ε</sup> = cell, cellR<sup>α</sup> is the R-strategy with only maximal R-augmentation **wtt**<sup>−</sup> <sup>x</sup> **ok**<sup>+</sup> <sup>x</sup><sup>α</sup>, cellW<sup>α</sup> has maximal <sup>R</sup>-augmentation **r**<sup>−</sup> <sup>y</sup> **tt**<sup>+</sup> <sup>α</sup><sup>y</sup>, and the empty <sup>R</sup>-strategy for the other cases. If <sup>s</sup> <sup>≤</sup><sup>M</sup> <sup>s</sup> , then s can be obtained from s using memory operations and there is a matching R-augmentation qss-∈ cell<sup>s</sup> defined location-wise in the obvious way.

Now, if σ : Ω(s)<sup>⊥</sup> - A is a <sup>R</sup>-strategy and <sup>q</sup> <sup>∈</sup> σ with moves only in Ω(s)<sup>⊥</sup> is causally compatible with <sup>q</sup>ss-, we define the **residual** of σ after <sup>q</sup>:

$$\sigma/(\mathsf{q}\lnot\otimes \mathsf{q}\_{s\rhd s\rhd}) : [\Omega(s')]^\perp \parallel \{A\}$$

If <sup>p</sup> <sup>∈</sup> <sup>σ</sup> with <sup>q</sup> <sup>→</sup> <sup>p</sup>, we write first <sup>p</sup> <sup>=</sup> <sup>p</sup>/(qqss- ) the R-augmentation with |p <sup>|</sup> <sup>=</sup> <sup>|</sup>p|\|q|, and with causal order the restriction of that of <sup>p</sup>. For e ∈ |p |, we set λ<sup>p</sup>- (e) to be λ<sup>p</sup>(e) whose arguments corresponding to negative events e in <sup>q</sup> are instantiated with λ<sup>q</sup>qss- (e ) ∈ R. With that, we set σ/(<sup>q</sup> <sup>q</sup><sup>s</sup>s- ) as comprising all <sup>p</sup>/(<sup>q</sup> <sup>q</sup><sup>s</sup>s-) for <sup>p</sup> <sup>∈</sup> σ with <sup>q</sup> <sup>→</sup> <sup>p</sup>.

Informally, this means that, considering some q which represents a scheduling of the memory operations turning s into s , we extract from σ its behavior after the execution of these memory operations. Finally, we generalize ≤<sup>R</sup> to R-augmentations by setting q ≤<sup>R</sup> q iff they have the same underlying partial order and for all <sup>e</sup> ∈ |q|, <sup>λ</sup><sup>q</sup>(e) <sup>≤</sup><sup>R</sup> <sup>λ</sup><sup>q</sup>-(e). With that, we can finally state:

**Lemma 1.** *Let* <sup>Ω</sup>(s) <sup>M</sup> : <sup>A</sup>*,* M,s1, α <sup>⇒</sup> M , s <sup>1</sup> <sup>s</sup> <sup>2</sup>, α *with* dom(s<sup>1</sup>) = dom(s <sup>1</sup>)*, and all resource annotations in* <sup>s</sup><sup>1</sup> *lower than* <sup>α</sup>*. Then, there is* <sup>q</sup> <sup>∈</sup> <sup>M</sup><sup>α</sup> *with events in* Ω(s)*, causally compatible with* <sup>q</sup><sup>s</sup>1s- <sup>1</sup> *, and a function*

ϕ : M α- cell<sup>s</sup>- <sup>2</sup> −→ M<sup>α</sup>/(<sup>q</sup> <sup>q</sup><sup>s</sup>1s- 1 )

*preserving* <sup>→</sup> *and* s.t. *for all* <sup>p</sup> <sup>q</sup><sup>s</sup>- <sup>2</sup> <sup>∈</sup> M α- cell<sup>s</sup>- <sup>2</sup> *,* <sup>ϕ</sup>(<sup>p</sup> <sup>q</sup><sup>s</sup>- <sup>2</sup> ) ≤<sup>R</sup> p q<sup>s</sup>- 2 *.*

This is proved by induction on the operational semantics – the critical cases are: assignment and dereferenciation exploiting that if <sup>α</sup> <sup>≤</sup><sup>R</sup> <sup>β</sup>, then <sup>α</sup> β <sup>=</sup> β (which boils down to idempotence); and parallel composition where compatibility of <sup>s</sup> and <sup>s</sup> entails that the corresponding augmentations of cell<sup>s</sup> are compatible.

Lemma 1, instantiated with M, <sup>∅</sup>, <sup>0</sup> <sup>⇒</sup> **skip**, s, α, yields soundness.

*Non-adequacy.* Our model is not adequate. To see why, consider:

$$\vdash \mathbf{new} \, x\_W, x\_R \, \mathbf{in} \left( \begin{array}{c} \mathbf{wait}(1); \\ x\_W := \mathbf{t}; \\ \mathbf{wait}(2) \end{array} \, \middle|\, !x\_R; \\ \begin{array}{c} \\ \mathbf{wait}(1) \end{array} \right) : \mathbf{bool}$$

Our model predicts that this may evaluate to **tt** in 3 s (see Fig. 12) and to **ff** in 4 s. However, the operational semantics can only evaluate it (both to **tt** and **ff**) in 4 s. Intuitively, the reason is that the causal shapes implicit in the reduction ⇒ are all series-parallel (generated with sequential and parallel composition), whereas the interaction in Fig. 12 is not.

Our causal semantic approach yields a finer resource analysis than achieved by the parallel operational semantics. The operational semantics, rather than our model, is to blame for non-adequacy: indeed, we now show that for <sup>R</sup> <sup>=</sup> <sup>R</sup><sup>+</sup> our model is adequate *w.r.t.* an operational semantics specialized for time.

#### **4 Adequacy for Time**

For time, we may refine the operational semantics by adding the following rule

$$
\langle \mathbf{wait}(t\_1 + t\_2), s, t\_0 \rangle \to \langle \mathbf{wait}(t\_2), s, t\_0 + t\_1 \rangle
$$

using which the program above evaluates to **tt** in 3 s. It is clear that the soundness theorem of the previous section is retained.

We first focus on adequacy for first-order programs without abstraction or application, written <sup>Ω</sup>(s) <sup>1</sup> <sup>M</sup> : **com**. For any <sup>t</sup><sup>0</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> there is M, s, t<sup>0</sup> <sup>⇒</sup> M , s s , t<sup>0</sup> where M<sup>t</sup><sup>0</sup> <sup>=</sup> M <sup>t</sup><sup>0</sup> cell<sup>s</sup> and M is in **canonical form**: it cannot be decomposed as C[**skip**; N], C[**skip** - N], C[N **skip**], <sup>C</sup>[**if tt** <sup>N</sup><sup>1</sup> <sup>N</sup><sup>2</sup>], <sup>C</sup>[**if ff** <sup>N</sup><sup>1</sup> <sup>N</sup><sup>2</sup>], <sup>C</sup>[**wait**(0)] and <sup>C</sup>[**new** x, y **in** <sup>N</sup>] for <sup>C</sup>[] an evaluation context.

Consider <sup>Ω</sup>(s) <sup>1</sup> <sup>M</sup> : **com**, and <sup>q</sup> <sup>∈</sup> M<sup>t</sup><sup>0</sup> cell<sup>s</sup> with a top element **done**<sup>+</sup> tf in **com**, the **result** – *i.e.* <sup>q</sup> describes an interaction between M<sup>t</sup><sup>0</sup> and the memory leading to a successful evaluation to **done** at time t<sup>f</sup>. To prove adequacy, we must extract from it a derivation from M, s, t<sup>0</sup>, at time <sup>t</sup><sup>f</sup>.

Apart from the top **done**<sup>+</sup> <sup>t</sup><sup>f</sup> , q only records memory operations, which we must replicate operationally in the adequate order. A **minimal operation with timing** t is either the top **done**<sup>+</sup> <sup>t</sup> if it is the only event in <sup>q</sup>, or a prefix (m<sup>t</sup> - n<sup>t</sup>) <sup>→</sup> <sup>q</sup> corresponding to a memory operation (for instance, in augmentations of Fig. 14, the only minimal operation has timing 2). If t <sup>=</sup> t<sup>0</sup>, this operation should be performed immediately. If t>t<sup>0</sup> we need to spend time to trigger it – it is then critical to spend time on *all available* **wait***s in parallel*:

**Lemma 2.** *For* <sup>Ω</sup>(s) <sup>1</sup> <sup>M</sup> : **com** *in canonical form,* <sup>t</sup><sup>0</sup> <sup>∈</sup> <sup>R</sup>+*,* <sup>q</sup> <sup>∈</sup> M<sup>t</sup><sup>0</sup> cell<sup>s</sup> *with result* **done**<sup>+</sup> <sup>t</sup><sup>f</sup> *, if all minimal operations have timing strictly greater than* <sup>t</sup>0*,*

$$
\langle M, s, t\_0 \rangle \equiv \langle M', s, t\_0 + t \rangle
$$

*for some* t > <sup>0</sup> *and* <sup>M</sup> *only differing from* M *by having smaller annotations in* **wait** *commands and at least one* **wait** *changed to* **skip***.*

*Furthermore, there is* <sup>q</sup> <sup>≤</sup><sup>R</sup> <sup>q</sup> *with* <sup>q</sup> <sup>∈</sup> M <sup>t</sup>0+<sup>t</sup> cell<sup>s</sup> *with result* **done**<sup>+</sup> tf *.*

**Fig. 14.** Spending time adequately (where **test** *<sup>M</sup>* <sup>=</sup> **if** *<sup>M</sup>* **skip** <sup>⊥</sup>)

*Proof.* As M is in canonical form, all delays in minimal operations are impacted by **wait**(t) commands in head position (*i.e.* such that <sup>M</sup> <sup>=</sup> <sup>C</sup>[**wait**(t)]). Let <sup>t</sup>min be the minimal time appearing in those **wait**(−) commands in head position. Using our new rule and parallel composition, we remove <sup>t</sup>min to all such instances of **wait**(−); then transform the resulting occurrences of **wait**(0) to **skip**.

A representative example is displayed in Fig. 14. In the second step, though !<sup>R</sup> is available immediately, we must wait to get the right result.

With that we can prove the key lemma towards adequacy.

**Lemma 3.** *Let* <sup>Ω</sup>(s) <sup>1</sup> <sup>M</sup> : **com***,* <sup>t</sup><sup>0</sup> <sup>∈</sup> <sup>R</sup>+*, and* <sup>q</sup> <sup>∈</sup> M<sup>t</sup><sup>0</sup> cell<sup>s</sup> *with result* **done**<sup>+</sup> <sup>t</sup><sup>f</sup> *in* **com***. Then, there is* M, s, t<sup>0</sup> <sup>⇒</sup> **skip**, <sup>−</sup>, t<sup>f</sup>*.*

*Proof.* By induction on the size of M. First, we convert M to canonical form. If all minimal operations in <sup>q</sup> <sup>∈</sup> M<sup>t</sup><sup>0</sup> have timing strictly greater than <sup>t</sup><sup>0</sup>, we apply Lemma 2 and conclude by induction hypothesis.

Otherwise, at least one minimal operation has timing t<sup>0</sup>. If it is the result **done**<sup>+</sup> <sup>t</sup><sup>0</sup> in X, then <sup>M</sup> is the constant **skip**. Otherwise, it is a memory operation, say <sup>p</sup> <sup>→</sup> <sup>q</sup> with <sup>p</sup> = (**r**<sup>t</sup><sup>0</sup> <sup>b</sup><sup>t</sup><sup>0</sup> ) and write also <sup>s</sup> <sup>=</sup> <sup>s</sup>[ → <sup>s</sup>().R<sup>t</sup><sup>0</sup> ]. It follows then by an induction on M that M <sup>=</sup> C[!<sup>R</sup>] for some <sup>C</sup>[], with

$$\mathsf{q}/(\mathsf{p}\circledast \mathsf{q}\_{s\rhd^{s\rhd}}) \in \langle C[b] \rangle\_{t\_0} \circledast \mathsf{c} \mathsf{ell}\_s$$

so M, s, t<sup>0</sup> <sup>⇒</sup> C[b], s , t<sup>0</sup> <sup>⇒</sup> **skip**, <sup>−</sup>, t<sup>f</sup> by induction hypothesis.

Adequacy follows for higher-order programs: in general, any M : **com** can be β-reduced to first-order M , leaving the interpretation unchanged. By Church-Rosser, M behaves like <sup>M</sup> operationally, up to weak bisimulation. Hence:

**Theorem 2.** *Let* <sup>M</sup> : **com***. For any* <sup>t</sup> <sup>∈</sup> <sup>R</sup>+*, if* **done**<sup>+</sup> <sup>t</sup> <sup>∈</sup> M<sup>0</sup> *then* <sup>M</sup> ⇓t*.*

### **5 Conclusion**

It would be interesting to compare our model with structures used in timing analysis, for instance [23] relies on a concurrent generalization of control flow graphs that is reminiscent of event structures. In future work we also plan to investigate whether our annotated model construction could be used for other purposes, such as symbolic execution or abstract interpretation.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Change Actions: Models of Generalised Differentiation**

Mario Alvarez-Picallo(B) and C.-H. Luke Ong(B)

University of Oxford, Oxford, UK {mario.alvarez-picallo,luke.ong}@cs.ox.ac.uk

**Abstract.** Change structures, introduced by Cai et al., have recently been proposed as a semantic framework for incremental computation. We generalise change actions, an alternative to change structures, to arbitrary cartesian categories and propose the notion of *change action model* as a categorical model for (higher-order) generalised differentiation. Change action models naturally arise from many geometric and computational settings, such as (generalised) cartesian differential categories, group models of discrete calculus, and Kleene algebra of regular expressions. We show how to build canonical change action models on arbitrary cartesian categories, reminiscent of the F`aa di Bruno construction.

### **1 Introduction**

Incremental computation is the process of incrementally updating the output of some given function as the input is gradually changed, without recomputing the entire function from scratch. Recently, Cai et al. [6] introduced the notion of change structure to give a semantic account of incremental computation. Change structures have subsequently been generalised to *change actions* [2], and proposed as a model for automatic differentiation [16]. These developments raise a number of questions about the structure of change actions themselves and how they relate to more traditional notions of differentiation.

<sup>A</sup> *change action* A = (|A|, ΔA, <sup>⊕</sup><sup>A</sup>, <sup>+</sup><sup>A</sup>, 0) is a set <sup>|</sup>A<sup>|</sup> equipped with a monoid (ΔA, <sup>+</sup><sup>A</sup>, <sup>0</sup>A) acting on it, via action <sup>⊕</sup><sup>A</sup> : <sup>|</sup>A| × ΔA → |A|. For example, every monoid (S, <sup>+</sup>, 0) gives rise to a (so-called *monoidal*) change action (S, S, <sup>+</sup>, <sup>+</sup>, 0). Given change actions A and B, consider functions f : <sup>|</sup>A|→|B|. A *derivative* of f is a function ∂f : <sup>|</sup>A|×ΔA <sup>→</sup> ΔB such that for all a ∈ |A|, δa <sup>∈</sup> ΔA, f(a⊕<sup>A</sup>δa) = <sup>f</sup>(a) <sup>⊕</sup><sup>B</sup> ∂f(a, δa). Change actions and differentiable functions (i.e. functions that have a regular derivative) organise themselves into categories (and indeed 2-categories) with finite (co)products, whereby morphisms are composed via the chain rule.

The definition of change actions (and derivatives of functions) makes no use of properties of **Set** beyond the existence of products. We develop the theory of change actions on arbitrary cartesian categories and study their properties. A first contribution is the notion of a *change action model*, which is defined to be a coalgebra for a certain (copointed) endofunctor CAct on the category **Cat**<sup>×</sup> of (small) cartesian categories. The functor CAct sends a category **C** to the category CAct(**C**) of (internal) change actions and differential maps on **C**.

There is a natural, extrinsic, notion of higher-order derivative in change action models. In such a model α : **<sup>C</sup>** <sup>→</sup> CAct(**C**), a **<sup>C</sup>**-object A is associated (via α) with a change action, the carrier object of whose monoid is in turn associated with a change action, and so on *ad infinitum*. We construct a "canonical" change action model, CActω(**C**), that internalises such ω-sequences that exhibit higher-order differentiation. Objects of CActω(**C**) are ω-sequences of "contiguously compatible" change actions; and morphisms are corresponding ω-sequences of differential maps, each map being the canonical (via α) derivative of the preceding in the ω-sequence. We show that CActω(**C**) is the final CActcoalgebra (relativised to change action models on **C**). The category CActω(**C**) may be viewed as a kind of Fa`a di Bruno construction [8,10] in the more general setting of change action models.

Change action models capture many versions of differentiation that arise in mathematics and computer science. We illustrate their generality via three examples. The first, *(generalised) cartesian differential categories* (GCDC) [4, 10], are themselves an axiomatisation of the essential properties of the derivative. We show that a GCDC **<sup>C</sup>**—which by definition associates every object A with a monoid L(A)=(L<sup>0</sup>(A), <sup>+</sup><sup>A</sup>, <sup>0</sup>A)—gives rise to change action models in various non-trivial ways.

Secondly we show how discrete differentiation in both the *calculus of finite differences* [15] and *Boolean differential calculus* [22,23] can be modelled using the full subcategory **GrpSet** of **Set** whose objects are groups. Our unifying formulation generalises these discrete calculi to arbitrary groups, and gives an account of the chain rule in these settings.

Our third example is differentiation of regular expressions. Recall that Kleene algebra K is the algebra of regular expressions. We show that the algebra of polynomials over a commutative Kleene algebra is a change action model.

*Outline.* In Sect. 2 we present the basic definitions of change actions and differential maps, and show how they can be organised into categories. The theory of change action is extended to arbitrary cartesian categories **C** in Sect. 3: we introduce the category CAct(**C**) of internal change actions on **C**. In Sect. 4 we present change action models, and properties of the tangent bundle functors. In Sect. 5 we illustrate the unifying power of change action models via three examples. In Sect. 6, we study the category CActω(**C**) of <sup>ω</sup>-change actions and ω-differential maps. Missing proofs are provided in an extended version of the present paper [1].

### **2 Change Actions**

<sup>A</sup> *change action* is a tuple A = (|A|, ΔA, <sup>⊕</sup>A, <sup>+</sup>A, <sup>0</sup>A) where <sup>|</sup>A<sup>|</sup> and ΔA are sets, (ΔA, <sup>+</sup>A, <sup>0</sup>A) is a monoid, and <sup>⊕</sup><sup>A</sup> : <sup>|</sup>A| × ΔA → |A<sup>|</sup> is an action of the monoid on <sup>|</sup>A|. <sup>1</sup> We omit the subscript from <sup>⊕</sup>A, <sup>+</sup><sup>A</sup> and 0<sup>A</sup> whenever we can.

**Definition 1 (Derivative condition).** Let A and B be change actions. A function f : <sup>|</sup>A|→|B<sup>|</sup> is *differentiable* if there is a function ∂f : <sup>|</sup>A|×ΔA <sup>→</sup> ΔB satisfying <sup>f</sup>(<sup>a</sup> <sup>⊕</sup><sup>A</sup> δa) = <sup>f</sup>(a) <sup>⊕</sup><sup>B</sup> ∂f(a, δa), for all <sup>a</sup> ∈ |A|, δa <sup>∈</sup> ΔA. We call ∂f <sup>a</sup> *derivative* for f, and write f : A <sup>→</sup> B whenever f is differentiable.

**Lemma 1 (Chain rule).** *Given* f : A <sup>→</sup> B *and* g : B <sup>→</sup> C *with derivatives* ∂f *and* ∂g *respectively, the function* ∂(g ◦ f) : <sup>|</sup>A| × ΔA <sup>→</sup> ΔC *defined by* ∂(g ◦ f)(a, δa) := ∂g(f(a), ∂f(a, δa)) *is a derivative for* g ◦ f : <sup>|</sup>A|→|C|*.*

*Proof.* Unpacking the definition, we have (<sup>g</sup> ◦ <sup>f</sup>)(a) <sup>⊕</sup><sup>C</sup> <sup>∂</sup>(<sup>g</sup> ◦ <sup>f</sup>)(a, δa) = <sup>g</sup>(f(a)) <sup>⊕</sup><sup>C</sup> ∂g(f(a), ∂f(a, δa)) = <sup>g</sup>(f(a) <sup>⊕</sup><sup>B</sup> ∂f(a, δa)) = <sup>g</sup>(f(<sup>a</sup> <sup>⊕</sup><sup>A</sup> δa)), as desired.

*Example 1 (Some useful change actions).*


**Regular Derivatives.** The preceding definitions neither assume nor guarantee a derivative to be additive (i.e. they may not satisfy ∂f(x, Δa <sup>+</sup> Δb) = ∂f(x, Δa) + ∂f(x, Δb)), as they are in standard differential calculus. A strictly weaker condition that we will now require is *regularity*: if a derivative is additive in its second argument then it is regular, but not vice versa. Under some conditions, the converse is also true.

**Definition 2.** Given a differentiable map f : A <sup>→</sup> B, a derivative ∂f for f is *regular* if, for all <sup>a</sup> ∈ |A<sup>|</sup> and δa, δb <sup>∈</sup> ΔA, we have <sup>f</sup>(a, <sup>0</sup>A)=0<sup>B</sup> and ∂f(a, δa <sup>+</sup><sup>A</sup> δb) = ∂f(a, δa) +<sup>B</sup> ∂f(<sup>a</sup> <sup>⊕</sup><sup>A</sup> δa, δb).

**Proposition 1.** *Whenever* f : A <sup>→</sup> B *is differentiable and has a unique derivative* ∂f*, this derivative is regular.*

**Proposition 2.** *Given* f : A <sup>→</sup> B *and* g : B <sup>→</sup> C *with regular derivatives* ∂f *and* ∂g *respectively, the derivative* ∂(g ◦ f) = ∂g ◦ f ◦ π<sup>1</sup>, ∂f *is regular.*

<sup>1</sup> Change actions are closely related to the notion of *change structures* introduced in [6] but differ from the latter in not being dependently typed or assuming the existence of an operator, and requiring ΔA to have a monoid structure compatible with the map ⊕.

**Two Categories of Change Actions.** The study of change actions can be undertaken in two ways: one can consider functions that are differentiable (without choosing a derivative); alternatively, the derivative itself can be considered part of the morphism. The former leads to the category **CAct**−, whose objects are change actions and morphisms are the differentiable maps.

The category **CAct**<sup>−</sup> was the category we originally proposed [2]. It is wellbehaved, possessing limits, colimits, and exponentials, which is a trivial corollary of the following result:

**Theorem 1.** *The category* **CAct**<sup>−</sup> *of change actions and differentiable morphisms is equivalent to* **PreOrd***, the category of preorders and monotone maps.*

The actual structure of the limits and colimits in **CAct**<sup>−</sup> is, however, not so satisfactory. One can, for example, obtain the product of two change actions A and B by taking their product in **PreOrd** and turning it into a change action, but the corresponding monoid action map ⊕ is not, in general, easily expressible, even if those for A and B are. Derivatives of morphisms in **CAct**<sup>−</sup> can also be hard to obtain, as exhibiting f as a morphism in **CAct**<sup>−</sup> merely proves it is differentiable but gives no clue as to how a derivative might be constructed.

A more constructive approach is to consider morphism as a function together with a choice of a derivative for it.

**Definition 3.** Given change actions A and B, a *differential map* f : A <sup>→</sup> B is a pair (|f|, ∂f) where <sup>|</sup>f<sup>|</sup> : <sup>|</sup>A|→|B<sup>|</sup> is a function, and ∂f : <sup>|</sup>A| × ΔA <sup>→</sup> ΔB is a regular derivative for <sup>|</sup>f|.

The category **CAct** has change actions as objects and differential maps as morphisms. The identity morphisms are (Id<sup>A</sup>, π<sup>1</sup>); given morphisms f : A <sup>→</sup> B and g : B <sup>→</sup> C, define the composite g◦f := (|g|◦|f|, ∂g◦ <sup>|</sup>f| ◦ π1, ∂f) : <sup>A</sup> <sup>→</sup> <sup>C</sup>.

Finite products and coproducts exist in **CAct** (see Theorems 2 and 4 for a more general statement). Whether limits and colimits exist in **CAct** beyond products and coproducts is open.

*Remark 1.* If one thinks of changes (i.e. elements of ΔA) as morphisms between elements of <sup>|</sup>A|, then regularity resembles functoriality. This intuition is explored in [1, Appendix F], where we show that categories of change actions organise themselves into 2-categories.

#### **3 Change Actions on Arbitrary Categories**

The definition of change actions makes no use of any properties of **Set** beyond the existence of products. Indeed, change actions can be characterised as just a kind of multi-sorted algebra, which is definable in any category with products.

**The Category CAct(C).** Consider the category **Cat**<sup>×</sup> of (small) cartesian categories (i.e. categories with chosen finite products) and product-preserving functors. We can define an endofunctor CAct : **Cat**<sup>×</sup> → **Cat**<sup>×</sup> sending a category **C** to the category of (internal) change actions on **C**.

The objects of CAct(**C**) are tuples A = (|A|, ΔA, <sup>⊕</sup>A, <sup>+</sup>A, <sup>0</sup>A) where <sup>|</sup>A<sup>|</sup> and ΔA are (arbitrary) objects in **<sup>C</sup>**, (ΔA, <sup>+</sup>A, <sup>0</sup>A) is a monoid object in **<sup>C</sup>**, and <sup>⊕</sup><sup>A</sup> : <sup>|</sup>A| × ΔA → |A<sup>|</sup> is a monoid action in **<sup>C</sup>**, i.e. a **<sup>C</sup>**-morphism satisfying, for all a : C → |A|, δ<sup>1</sup>a, δ<sup>2</sup><sup>a</sup> : <sup>C</sup> <sup>→</sup> ΔA:

$$
\oplus\_A \circ \langle a, 0\_A \circ ! \rangle = a$$

$$
\oplus\_A \circ \langle a, +\_A \circ \langle \delta\_1 a, \delta\_2 a \rangle \rangle = \oplus\_A \circ \langle \oplus\_A \circ \langle a, \delta\_1 a \rangle, \delta\_2 a \rangle$$

Given objects A, B in CAct(**C**), the morphisms of CAct(A, B) are pairs f <sup>=</sup> (|f|, ∂f) where <sup>|</sup>f<sup>|</sup> : <sup>|</sup>A|→|B<sup>|</sup> and ∂f : <sup>|</sup>A| × ΔA <sup>→</sup> ΔB are morphisms in **<sup>C</sup>**, satisfying a diagrammatic version of the derivative condition:

$$\begin{aligned} |A| \times \Delta A & \xrightarrow{\langle |f| \circ \pi\_1, \partial f \rangle} |B| \times \Delta B\\ \sideset{}{^{\oplus\_A}}{}{\mathop{|}{\begin{array}{c}|c|l| \to \cdot\\|f| \end{array}}} & \begin{array}{c} \downarrow \\ \downarrow \\ |B| \end{array} \end{aligned}$$

f Additionally, we require our derivatives to be regular, as in Definition 2, i.e. for all morphisms a : C → |A|, δ1a, δ2<sup>a</sup> : <sup>C</sup> <sup>→</sup> ΔA, the following equations hold:

$$\begin{aligned} \partial f \circ \langle a, 0\_A \circ \rangle &= 0\_B\\ \partial f \circ \langle a, +\_A \circ \langle \delta\_1 a, \delta\_2 a \rangle \rangle &= +\_A \circ \langle \partial f \circ \langle a, \delta\_1 a \rangle, \partial f \circ \langle +\_A \circ \langle a, \delta\_1 a \rangle, \delta\_2 a \rangle \rangle \end{aligned}$$

The chain rule can then be expressed naturally by pasting two instances of the previous diagram together:

g f Hence f ◦ g <sup>=</sup> (|g|◦|f|) ◦ π<sup>1</sup>, ∂g ◦ <sup>|</sup>f| ◦ π<sup>1</sup>, ∂f.

Now, given a product-preserving functor F : **C** → **D**, there is a corresponding functor CAct(F) : CAct(**C**) → CAct(**D**) given by:

$$\begin{aligned} \text{CAct}(\mathcal{F})(|A|, \Delta A, \oplus\_A, +\_A, 0\_A) &:= (\mathcal{F}(|A|), \mathcal{F}(\Delta A), \mathcal{F}(\oplus\_A), \mathcal{F}(+\_A), \mathcal{F}(0\_A)) \\ \text{CAct}(\mathcal{F})(|f|, \partial f) &:= (\mathcal{F}(|f|), \mathcal{F}(\partial f)) \end{aligned}$$

We can embed **<sup>C</sup>** fully and faithfully into CAct(**C**) via the functor <sup>η</sup>**<sup>C</sup>** which sends an object <sup>A</sup> of **<sup>C</sup>** to the "trivial" change action <sup>A</sup> = (A, , π<sup>1</sup>, !, !) and every morphism f : A <sup>→</sup> B of **<sup>C</sup>** to the morphism (f, !). As before, this functor extends to a natural transformation from the identity functor to CAct.

Additionally, there is an obvious forgetful functor <sup>ε</sup>**<sup>C</sup>** : CAct(**C**) <sup>→</sup> **<sup>C</sup>**, which defines the components of a natural transformation ε from the functor CAct to the identity endofunctor Id.

Given **<sup>C</sup>**, we write <sup>ξ</sup>**<sup>C</sup>** for the functor CAct(ε**C**) : CAct(CAct(**C**)) <sup>→</sup> CAct(**C**).<sup>2</sup> Explicitly, this functor maps an object (A, B, <sup>⊕</sup>, <sup>+</sup>, 0) in CAct(CAct(**C**)) to the object (|A|, <sup>|</sup>B|, |⊕|, <sup>|</sup>+|, <sup>|</sup>0|). Intuitively, <sup>ε</sup>CAct(**C**) prefers the "original" structure on objects, whereas <sup>ξ</sup>**<sup>C</sup>** prefers the "higher" structure. The equaliser of these two functors is precisely the category of change actions whose higher structure is the original structure.

**Products and Coproducts in CAct(C).** We have defined CAct as an endofunctor on cartesian categories. This is well-defined: if **C** has all finite (co)products, so does CAct(**C**). Let A = (|A|, ΔA, <sup>⊕</sup><sup>A</sup>, <sup>+</sup><sup>A</sup>, <sup>0</sup>A) and <sup>B</sup> <sup>=</sup> (|B|, ΔB, <sup>⊕</sup><sup>B</sup>, <sup>+</sup><sup>B</sup>, <sup>0</sup>B) be change actions on **<sup>C</sup>**. We present their product and coproducts as follows.

**Theorem 2.** *The following change action is the product of* A *and* B *in* CAct(**C**)

$$A \times B := \left( |A| \times |B|, \Delta A \times \Delta B, \oplus\_{A \times B}, +\_{A \times B}, \langle 0\_A, 0\_B \rangle \right)$$

*where* ⊕<sup>A</sup>×<sup>B</sup> := <sup>⊕</sup><sup>A</sup> ◦ (π<sup>1</sup> <sup>×</sup> <sup>π</sup><sup>1</sup>), <sup>⊕</sup><sup>B</sup> ◦ (π<sup>2</sup> <sup>×</sup> <sup>π</sup><sup>2</sup>) *and* <sup>+</sup><sup>A</sup>×<sup>B</sup> := <sup>+</sup><sup>A</sup> ◦ (π<sup>1</sup> <sup>×</sup> <sup>π</sup><sup>1</sup>), <sup>+</sup><sup>B</sup> ◦ (π<sup>2</sup> <sup>×</sup>π<sup>2</sup>)*. The projections are* <sup>π</sup><sup>1</sup> = (π1, π<sup>1</sup> ◦ <sup>π</sup><sup>2</sup>)*and* <sup>π</sup><sup>2</sup> = (π2, π<sup>2</sup> ◦ <sup>π</sup><sup>2</sup>), *writing* <sup>f</sup> *for maps* <sup>f</sup> *in* CAct *to distinguish them from* **<sup>C</sup>***-maps.*

**Theorem 3.** *The change action*  = (, , π1, π1,Id) *is the terminal object in* CAct(**C**)*, where is the terminal object of* **<sup>C</sup>***. Furthermore, if* A *is a change action every point* <sup>|</sup>f<sup>|</sup> : →|A<sup>|</sup> *in* **<sup>C</sup>** *is differentiable, with (unique) derivative* 0A*.*

Whenever we have a differential map f : A×B <sup>→</sup> C between change actions, we can compute its derivative ∂f by adding together its "partial" derivatives:<sup>3</sup>.

**Lemma 2.** *Let* f : A <sup>×</sup> B <sup>→</sup> C *be a differential map. Then*

$$\partial f((a,b),(\delta a,\delta b)) = +\_C \circ \langle \partial f((a,b),(\delta a,0\_B)), \partial f((\oplus\_A \circ \langle a,\delta a \rangle,b), (0\_A,\delta b)) \rangle$$

*(The notational abuse is justified by the internal logic of a cartesian category.)*

**Theorem 4.** *If* **<sup>C</sup>** *is distributive, with law* <sup>δ</sup>A,B,C : (<sup>A</sup> <sup>B</sup>) <sup>×</sup> <sup>C</sup> <sup>→</sup> (<sup>A</sup> <sup>×</sup> <sup>C</sup>) (B <sup>×</sup> C)*, the following change action is the coproduct of* A *and* B *in* CAct(**C**)

$$A \sqcup B := \left( |A| \sqcup |B|, \Delta A \times \Delta B, \oplus\_{A \sqcup B}, +\_{A \sqcup B}, \langle 0\_A, 0\_B \rangle \right).$$

*where* <sup>⊕</sup><sup>A</sup><sup>B</sup> := [⊕<sup>A</sup> ◦ (Id<sup>A</sup> <sup>×</sup> <sup>π</sup><sup>1</sup>), <sup>⊕</sup><sup>B</sup> ◦ (Id<sup>B</sup> <sup>×</sup> <sup>π</sup><sup>2</sup>)] ◦ <sup>δ</sup>A,B,C *, and* <sup>+</sup><sup>A</sup><sup>B</sup> := <sup>+</sup><sup>A</sup> ◦ (π<sup>1</sup> <sup>×</sup> <sup>π</sup><sup>1</sup>), <sup>+</sup><sup>B</sup> ◦ (π<sup>2</sup> <sup>×</sup> <sup>π</sup><sup>2</sup>)*. The injections are* <sup>ι</sup><sup>1</sup> = (ι<sup>1</sup>, π<sup>2</sup>, <sup>0</sup><sup>B</sup>) *and* <sup>ι</sup><sup>2</sup> = (ι<sup>2</sup>, <sup>0</sup><sup>A</sup>, π<sup>2</sup>)*.*

<sup>2</sup> One might expect CAct to be a comonad with ε as a counit. But if this were the

case, we would have <sup>ξ</sup>**<sup>C</sup>** <sup>=</sup> <sup>ε</sup>CAct(**C**), which is, in general, not true. <sup>3</sup> Alternatively, one can define the (first) partial derivative of a map <sup>f</sup>(x, y) as a map δ1f such that f(x ⊕ δx, y) = f(x, y) ⊕ δ1(x, y, δx). It can be shown that a map is differentiable iff its first and second derivatives exist.

**Stable Derivatives and Additivity.** We do not require derivatives to be additive in their second argument; indeed in many cases they are not. Under some simple conditions, however, (regular) derivatives can be shown to be additive.

**Definition 4.** Given a (internal) change action A and objects <sup>|</sup>B|, <sup>|</sup>C<sup>|</sup> in a cartesian category **<sup>C</sup>**, a morphism u : <sup>|</sup>A|×|B|→|C<sup>|</sup> is *stable* whenever the diagram commutes: (|A| × ΔA) × |B| |A|×|B| A B C π1×Id ⊕A×Id u u

If one thinks of ΔA as the object of "infinitesimal" transformations on <sup>|</sup>A|, then the preceding definition says that a morphism u : <sup>|</sup>A|×|B|→|C<sup>|</sup> is stable whenever infinitesimal changes on the input A do not affect its output.

**Lemma 3.** *Let* f = (|f|, ∂f) *be a differential map in* CAct(**C**)*. If* ∂f *is stable, then it is additive in its second argument*4*, i.e. for all* x, δ1x, δ2x *we have:*

$$
\langle \partial f \diamond \langle x, +\_A \circ \langle \delta\_1 x, \delta\_2 x \rangle \rangle = + \circ \langle \partial f \diamond \langle x, \delta\_1 x \rangle, \partial f \diamond \langle x, \delta\_2 x \rangle \rangle
$$

**Lemma 4.** *Let* f = (|f|, ∂f) *and* g = (|g|, ∂g) *be differential maps, with* ∂g *stable. Then* ∂(g ◦ f) *is stable.*

It is straightforward to see that the category Stab(**C**) of change actions and differential maps with stable derivatives is a subcategory of CAct(**C**).

#### **4 Higher-Order Derivatives: The Extrinsic View**

In this section we study categories in which every object is equipped with a change action, and every morphism specifies a corresponding differential map. This provides a simple way of characterising categories which are models of higher-order differentiation purely in terms of change actions.

**Change Action Models.** Recall that a *copointed endofunctor* is a pair (F, σ) where the endofunctor F : **C** → **C** is equipped with a natural transformation σ : F . −→ Id. A *coalgebra of a copointed endofunctor* (F, σ) is an object A of **<sup>C</sup>** together with a morphism <sup>α</sup> : <sup>A</sup> <sup>→</sup> <sup>F</sup><sup>A</sup> such that <sup>σ</sup><sup>A</sup> ◦ <sup>α</sup> = IdA.

**Definition 5.** We call a coalgebra α : **<sup>C</sup>** <sup>→</sup> CAct(**C**) of the copointed endofunctor (CAct, ε) a *change action model* (on **<sup>C</sup>**).

*Assumption*. Throughout Sect. 4, we fix a change action model α : **<sup>C</sup>** <sup>→</sup> CAct(**C**).

Given an object A of **<sup>C</sup>**, the coalgebra α specifies a (internal) change action α(A)=(A, ΔA, <sup>⊕</sup><sup>A</sup>, <sup>+</sup><sup>A</sup>, <sup>0</sup>A) in CAct(**C**). (We abuse notation and write ΔA for the carrier object of the monoid specified in α(A); similarly for +<sup>A</sup>, <sup>⊕</sup>A, and <sup>0</sup>A.) Given a morphism f : A <sup>→</sup> B in **<sup>C</sup>**, there is an associated differential map

<sup>4</sup> Note that the converse is not the case, i.e. a derivative can be additive but not stable.

α(f)=(f, ∂f) : α(A) <sup>→</sup> α(B). Since ∂f : A <sup>×</sup> ΔA <sup>→</sup> ΔB is also a **<sup>C</sup>**-morphism, there is a corresponding differential map α(∂f)=(∂f, ∂<sup>2</sup>f) in CAct(**C**), where ∂<sup>2</sup>f : (A×ΔA)×(ΔA×Δ<sup>2</sup>A) <sup>→</sup> Δ<sup>2</sup>B is a second derivative for f. Iterating this process, we obtain an n-th derivative ∂n<sup>f</sup> for every **<sup>C</sup>**-morphism <sup>f</sup>. Thus change action models offer a setting for reasoning about higher-order differentiation.

**Tangent Bundles in Change Action Models.** In differential geometry the tangent bundle functor, which maps every manifold to its tangent bundle, is an important construction. There is an endofunctor on change action models reminiscent of the tangent bundle functor, with analogous properties.

**Definition 6.** The *tangent bundle functor* T : **<sup>C</sup>** <sup>→</sup> **<sup>C</sup>** is defined as TA := A <sup>×</sup> ΔA and Tf := <sup>f</sup> ◦ <sup>π</sup>1, ∂f.

*Notation*. We use shorthand <sup>π</sup>ij := <sup>π</sup><sup>i</sup> ◦ <sup>π</sup><sup>j</sup> .

The tangent bundle functor T preserves products up to isomorphism, i.e. for all objects A, B of **<sup>C</sup>**, we have T(<sup>A</sup> <sup>×</sup> <sup>B</sup>) <sup>∼</sup><sup>=</sup> <sup>T</sup><sup>A</sup> <sup>×</sup> <sup>T</sup><sup>B</sup> and T1 <sup>∼</sup><sup>=</sup> 1. In particular, <sup>φ</sup>A,B := π11, π<sup>12</sup>, π21, π<sup>22</sup> : TA <sup>×</sup> <sup>T</sup>B <sup>→</sup> T(A <sup>×</sup> B) is an isomorphism. Consequently, given maps f : A <sup>→</sup> B and g : A <sup>→</sup> C, then, up to the previous isomorphism, T f,g <sup>=</sup> <sup>T</sup>f, <sup>T</sup>g.

A consequence of the structure of products in CAct(**C**) is that the map ⊕<sup>A</sup>×<sup>B</sup> inherits the pointwise structure in the following sense:

**Lemma 5.** *Let* <sup>φ</sup>A,B : T<sup>A</sup> <sup>×</sup> <sup>T</sup><sup>B</sup> <sup>→</sup> T(<sup>A</sup> <sup>×</sup> <sup>B</sup>) *be the canonical isomorphism described above. Then* <sup>⊕</sup><sup>A</sup>×<sup>B</sup> ◦ <sup>φ</sup>A,B <sup>=</sup> <sup>⊕</sup><sup>A</sup> × ⊕B*.*

It will often be convenient to operate directly on the functor T, rather than on the underlying derivatives. For these, the following results are useful:

**Lemma 6.** *The following families of morphisms are natural transformations:* <sup>π</sup><sup>1</sup>, <sup>⊕</sup><sup>A</sup> : T(A) <sup>→</sup> A, <sup>z</sup> := Id, <sup>0</sup> : A <sup>→</sup> T(A) l := π<sup>1</sup>, <sup>0</sup>, π<sup>2</sup>, <sup>0</sup> : T(A) <sup>→</sup> <sup>T</sup><sup>2</sup>(A). *Additionally, the triple* (T, <sup>z</sup>, <sup>T</sup>⊕) *defines a monad on* **<sup>C</sup>***.*

A particularly interesting class of change action models are those that are also cartesian closed. Surprisingly, this has as an immediate consequence that differentiation is itself internal to the category.

**Lemma 7 (Internalisation of derivatives).** *Whenever* **C** *is cartesian closed, there is a morphism* <sup>d</sup>A,B : (<sup>A</sup> <sup>⇒</sup> <sup>B</sup>) <sup>→</sup> (<sup>A</sup> <sup>×</sup> ΔA) <sup>⇒</sup> ΔB *such that, for any morphism* <sup>f</sup> : 1 <sup>×</sup> <sup>A</sup> <sup>→</sup> <sup>B</sup>*,* <sup>d</sup>A,B ◦ Λf <sup>=</sup> <sup>Λ</sup>(∂f ◦ π<sup>1</sup>, π<sup>12</sup>, π<sup>1</sup>, π<sup>22</sup>)*.*

Under some conditions, we can classify the structure of the exponentials in (CAct, ε)-coalgebras. This requires the existence of an infinitesimal object.<sup>5</sup>

<sup>5</sup> The concept of "infinitesimal object" is borrowed from synthetic differential geometry [18]. However, there is nothing intrinsically "infinitesimal" about such objects here.

**Definition 7.** If **<sup>C</sup>** is cartesian closed, an *infinitesimal object* D is an object of **C** such that the tangent bundle functor T is represented by the covariant Hom-functor D <sup>⇒</sup> (·), i.e. there is a natural isomorphism φ : (D <sup>⇒</sup> (·)) . −→ T.

**Lemma 8.** *Whenever there is an infinitesimal object in* **C***, the tangent bundle* T(A <sup>⇒</sup> B) *is naturally isomorphic to* A <sup>⇒</sup> <sup>T</sup>B*.*

We would like the tangent bundle functor to preserve the exponential structure; in particular we would expect a result of the form <sup>∂</sup> (λy.t) ∂x <sup>=</sup> λy. ∂ t ∂x , which is true in differential λ-calculus [11]. Unfortunately it seems impossible to prove in general that this equation holds, although weaker results are available. If the tangent bundle functor is representable, however, additional structure is preserved.

**Theorem 5.** *The isomorphism between the functors* T(A <sup>⇒</sup> (·)) *and* A <sup>⇒</sup> T(·) *respects the structure of* T*, in the sense that the diagram commutes.*

$$\begin{aligned} \mathrm{T}(A \Rightarrow B) &\xrightarrow{\cong} A \Rightarrow \mathrm{T}(B) \\ \sideset{}{^{\oplus\_{A \Rightarrow B}}} & \searrow & \mathrm{Id}\_{A \Rightarrow \oplus\_{B}} \\ A \Rightarrow B \end{aligned}$$

#### **5 Examples of Change Action Models**

**Generalised Cartesian Differential Categories.** *Generalised cartesian differential categories* (GCDC) [10]—a recent generalisation of cartesian differential categories [4]—are models of differential calculi. We show that change action models generalise GCDC in that GCDCs give rise to change action models in three<sup>6</sup> different (non-trivial) ways. In this subsection let **C** be a GCDC (we assume familiarity with the definitions and notations in [10]).

*1. The Flat Model.* Define the functor α : **<sup>C</sup>** <sup>→</sup> CAct(**C**) as follows. Let f : A <sup>→</sup> <sup>B</sup> be a **<sup>C</sup>**-morphism. Then <sup>α</sup>(A) := (A, L<sup>0</sup>(A), π<sup>1</sup>, <sup>+</sup><sup>A</sup>, <sup>0</sup>A) and α(f) := (f, D [f]).

**Theorem 6.** *The functor* α *is a change action model.*

*2. The Kleisli Model.* GCDCs admit a tangent bundle functor, defined analogously to the standard notion in differential geometry. Let f : A <sup>→</sup> B be a **<sup>C</sup>**morphism. Define the *tangent bundle functor* T : **<sup>C</sup>** <sup>→</sup> **<sup>C</sup>** as: T<sup>A</sup> := <sup>A</sup> <sup>×</sup> <sup>L</sup><sup>0</sup>(A), and Tf := f ◦ π<sup>1</sup>, D [f]. The functor T is in fact a monad, with unit <sup>η</sup> <sup>=</sup> Id, <sup>0</sup><sup>A</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup> <sup>×</sup> <sup>L</sup><sup>0</sup>(A) and multiplication <sup>μ</sup> : (<sup>A</sup> <sup>×</sup> <sup>L</sup><sup>0</sup>(A)) <sup>×</sup> <sup>L</sup><sup>0</sup>(A)<sup>2</sup> <sup>→</sup> A <sup>×</sup> L<sup>0</sup>(A) defined by the composite:

$$\left\{(A \times L\_0(A)) \times L\_0(A)^2 \xrightarrow{\langle \pi\_1 \diamond \pi\_1, \langle \pi\_2 \diamond \pi\_1, \pi\_1 \diamond \pi\_2 \rangle \rangle} \right\} A \times L\_0(A)^2 \xrightarrow{\operatorname{Id} \times +\_A} A \times L\_0(A)$$

Thus we can define the Kleisli category of this functor by **C**<sup>T</sup> which has geometric significance as a category of generalised vector fields.

<sup>6</sup> The third, the Eilenberg-Moore model, is presented in [1, Appendix D].

We define the functor <sup>α</sup><sup>T</sup> : **<sup>C</sup>**<sup>T</sup> <sup>→</sup> CAct(**C**T): given a **<sup>C</sup>**T-morphism <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup>, set <sup>α</sup><sup>T</sup>(A) := (A, L<sup>0</sup>(A),Id<sup>A</sup> <sup>×</sup>IdL0(A), η ◦+A, η ◦ <sup>0</sup>A) and <sup>α</sup><sup>T</sup>(f) := (f, D [f]).

# **Lemma 9.** <sup>α</sup><sup>T</sup> *is a change action model.*

*Remark 2.* The converse is not true: in general the existence of a change action model on **C** does not imply that **C** satisfies the GCDC axioms. However, if one requires, additionally, (ΔA, <sup>+</sup>A, <sup>0</sup>A) to be commutative, with <sup>Δ</sup>(ΔA) = ΔA and <sup>⊕</sup>ΔA = +<sup>A</sup> for all objects <sup>A</sup>, and some technical conditions (stability and uniqueness of derivatives), then it can be shown that **C** is indeed a GCDC.

**Difference Calculus and Boolean Differential Calculus.** Consider the full subcategory **GrpSet** of **Set** whose objects are all the groups<sup>7</sup>. This is a cartesian closed category which can be endowed with the structure of a (CAct, ε)-coalgebra α in a straightforward way.

Given a group A = (A, <sup>+</sup>, <sup>0</sup>, <sup>−</sup>), define change action α(A) := (A, A, <sup>+</sup>, <sup>+</sup>, 0) Given a function f : A <sup>→</sup> B, define differential map α(f) := (f, ∂f) where ∂f(x, δx) := <sup>−</sup>f(x) + f(x <sup>⊕</sup> δx). Notice f(x) <sup>⊕</sup> ∂f(x, δx) = f(x)+(−f(x) + f(x <sup>+</sup> δx)) = f(x <sup>+</sup> δx) = f(x <sup>⊕</sup> δx); hence ∂f is a derivative for f which is regular (but not necessarily additive), and α(f) a map in CAct(**GrpSet**). The following result is then immediate.

# **Lemma 10.** <sup>α</sup> : **GrpSet** <sup>→</sup> CAct(**GrpSet**) *defines a change action model.*

This result is significant: in the calculus of finite differences [15], the *discrete derivative* (or *discrete difference operator* ) of a function f : <sup>Z</sup> <sup>→</sup> <sup>Z</sup> is defined as δf(x) := f(x + 1) <sup>−</sup> f(x). In fact the discrete derivative δf is (an instance of) the derivative of f *qua* morphism in **GrpSet**, i.e. δf(x) = ∂f(x, 1).

Finite difference calculus [13,15] has found applications in combinatorics and numerical computation. Our formulation via change action model over **GrpSet** has several advantages. First it justifies the chain rule, which seems new. Secondly, it generalises the calculus to arbitrary groups. To illustrate this, consider *Boolean differential calculus* [22,23], a technique that applies methods from calculus to the space B<sup>n</sup> of vectors of elements of some Boolean algebra B.

**Definition 8.** Given a Boolean algebra <sup>B</sup> and function f : <sup>B</sup><sup>n</sup> <sup>→</sup> <sup>B</sup><sup>m</sup>, the <sup>i</sup>-*th Boolean derivative of* f *at* (u<sup>1</sup>,...,u<sup>n</sup>) <sup>∈</sup> <sup>B</sup><sup>n</sup> is the value ∂f ∂x*<sup>i</sup>* (u<sup>1</sup>,...,u<sup>n</sup>) := f(u<sup>1</sup>,...,u<sup>n</sup>) f(u<sup>1</sup>,...,¬u<sup>i</sup>,...,u<sup>n</sup>) writing u v := (u ∧ ¬v) <sup>∨</sup> (¬u <sup>∧</sup> v) for exclusive-or.

Now <sup>B</sup><sup>n</sup> is a **GrpSet**-object. Set <sup>i</sup> := (⊥, <sup>i</sup>−<sup>1</sup> ..., <sup>⊥</sup>, , <sup>⊥</sup>, <sup>n</sup>−<sup>i</sup> ..., <sup>⊥</sup>) <sup>∈</sup> <sup>B</sup><sup>n</sup>.

**Lemma 11.** *The Boolean derivative of* f : <sup>B</sup><sup>n</sup> <sup>→</sup> <sup>B</sup><sup>m</sup> *coincides with its derivative qua morphism in* **GrpSet***:* ∂f ∂x*<sup>i</sup>* (u<sup>1</sup>,...,u<sup>n</sup>) = ∂f((u<sup>1</sup>,...,u<sup>n</sup>), <sup>i</sup>)*.*

<sup>7</sup> We consider arbitrary functions, rather than group homomorphisms, since, according to this change action structure, every function between groups is differentiable.

**Polynomials over Commutative Kleene Algebras.** The algebra of polynomials over a commutative Kleene algebra [14,17] (see [12,21] for work of a similar vein) is a change action model. Recall that Kleene algebra is the algebra of regular expressions [5,9]. Formally a *Kleene algebra* <sup>K</sup> is a tuple (K, <sup>+</sup>, ·, , <sup>0</sup>, 1) such that (K, <sup>+</sup>, ·, <sup>0</sup>, 1) is an idempotent semiring under + satisfying, for all a, b, c <sup>∈</sup> K:

$$1 + a\,a^\star = a^\star \quad 1 + a^\star\\a = a^\star \quad b + a\,c \le c \to a^\star\\b \le c \quad b + c\,a \le c \to b\\a^\star \le c$$

where a <sup>≤</sup> b := a <sup>+</sup> b <sup>=</sup> b. A Kleene algebra is *commutative* whenever · is.

Henceforth fix a commutative Kleene algebra K. Define the *algebra of polynomials* <sup>K</sup>[x] as the free extension of the algebra <sup>K</sup> with elements x <sup>=</sup> x1,...,x<sup>n</sup>. We write p(a) for the value of <sup>p</sup>(x) evaluated at <sup>x</sup> → <sup>a</sup>. Polynomials, viewed as functions, are closed under composition: when <sup>p</sup> <sup>∈</sup> <sup>K</sup>[x], q1,...,q<sup>n</sup> <sup>∈</sup> <sup>K</sup>[y] are polynomials, so is the composite p(q<sup>1</sup>(y),...,q<sup>n</sup>(y)).

Given a polynomial p <sup>=</sup> p(x), we define its i*-th derivative* ∂ p ∂x*<sup>i</sup>* (x) <sup>∈</sup> <sup>K</sup>[x]:

$$\frac{\partial \, a}{\partial x\_i}(\overline{x}) = 0 \qquad \frac{\partial \, p^\star }{\partial x\_i}(\overline{x}) = p^\star(\overline{x}) \frac{\partial \, p}{\partial x\_i}(\overline{x}) \qquad \frac{\partial \, x\_j}{\partial x\_i}(\overline{x}) = \begin{cases} 1 \text{ if } i = j \\ 0 \text{ otherwise} \end{cases}$$

$$\frac{\partial \, (p+q)}{\partial x\_i}(\overline{x}) = \frac{\partial \, p}{\partial x\_i}(\overline{x}) + \frac{\partial \, q}{\partial x\_i}(\overline{x}) \qquad \frac{\partial \, (p\, q)}{\partial x\_i}(\overline{x}) = p(\overline{x}) \frac{\partial \, q}{\partial x\_i}(\overline{x}) + q(\overline{x}) \frac{\partial \, p}{\partial x\_i}(\overline{x})$$

Write ∂ p ∂x*<sup>i</sup>* (e) to mean the result of evaluating the polynomial ∂ p ∂x*<sup>i</sup>* (x) at x → e.

**Theorem 7 (Taylor's formula** [14]**).** *Let* p(x) <sup>∈</sup> <sup>K</sup>[x]*. For all* a, b <sup>∈</sup> <sup>K</sup>[x]*, we have* p(a <sup>+</sup> b) = p(a) + b · ∂ p ∂x (<sup>a</sup> <sup>+</sup> <sup>b</sup>)*.*

The category of finite powers of <sup>K</sup>, <sup>K</sup>×, has all natural numbers n as objects. The morphisms <sup>K</sup>×[m, n] are <sup>n</sup>-tuples of polynomials (p1,...,p<sup>n</sup>) where <sup>p</sup><sup>1</sup>,...,p<sup>n</sup> <sup>∈</sup> <sup>K</sup>[x<sup>1</sup>,...,x<sup>m</sup>]. Composition of morphisms is the usual composition of polynomials.

**Lemma 12.** *The category* <sup>K</sup><sup>×</sup> *is a cartesian category, endowed with a change action model* <sup>α</sup> : <sup>K</sup><sup>×</sup> <sup>→</sup> CAct(K×) *whereby* <sup>α</sup>(K) := (K, <sup>K</sup>, <sup>+</sup>, <sup>+</sup>, 0)*,* <sup>α</sup>(K<sup>i</sup> ) := α(K)<sup>i</sup> *; for* p = (p<sup>1</sup>(x),...,p<sup>n</sup>(x)) : <sup>K</sup><sup>m</sup> <sup>→</sup> <sup>K</sup><sup>n</sup>*,* α(p) := (p,(p <sup>1</sup>,...,p <sup>n</sup>))*, where* (p <sup>i</sup> <sup>=</sup> <sup>p</sup> <sup>i</sup>(x<sup>1</sup>,...,x<sup>m</sup>, y<sup>1</sup>,...,y<sup>m</sup>) := <sup>n</sup> <sup>j</sup>=1 <sup>y</sup><sup>j</sup> · ∂ p*<sup>i</sup>* ∂x*<sup>j</sup>* (x<sup>1</sup> <sup>+</sup> <sup>y</sup><sup>1</sup>,...,x<sup>m</sup> <sup>+</sup> <sup>y</sup><sup>m</sup>)*.*

*Remark 3.* Interestingly derivatives are not additive in the second argument. Take p(x) = x<sup>2</sup>. Then ∂p(a, b <sup>+</sup> <sup>c</sup>) > ∂p(a, b) + ∂p(a, c). It follows that <sup>K</sup>[x] cannot be modelled by GCDC (because of axiom [CD.2]).

#### **6** *ω***-Change Actions and** *ω***-Differential Maps**

A change action model α : **<sup>C</sup>** <sup>→</sup> CAct(**C**) is a category that supports higherorder differentials: each **<sup>C</sup>**-object A is associated with an ω-sequence of change actions—α(A), α(ΔA), α(Δ<sup>2</sup>A),...—in which every change action is compatible with the neighbouring change actions. We introduce ω-*change actions* as a means of constructing change action models "freely": given a cartesian category **<sup>C</sup>**, the objects of the category CActω(**C**) are all ω-sequences of "contiguously compatible" change actions.

We work with <sup>ω</sup>-sequences [Ai]i∈<sup>ω</sup> and [fi]i∈<sup>ω</sup> of objects and morphisms in **<sup>C</sup>**. We write <sup>p</sup>k([Ai]i∈ω) := <sup>A</sup><sup>k</sup> for the <sup>k</sup>-th element of the <sup>ω</sup>-sequence (similarly for <sup>p</sup>k([f<sup>i</sup>]<sup>i</sup>∈<sup>ω</sup>)), and omit the subscript '<sup>i</sup> <sup>∈</sup> <sup>ω</sup>' from [A<sup>i</sup>]<sup>i</sup>∈<sup>ω</sup> to reduce clutter. Given ω-sequences [A<sup>i</sup>] and [B<sup>i</sup>] of objects of a cartesian category **<sup>C</sup>**, define ω-sequences, *product* [A<sup>i</sup>] <sup>×</sup> [B<sup>i</sup>], *left shift* <sup>Π</sup>[A<sup>i</sup>] and *derivative space* **<sup>D</sup>**[A<sup>i</sup>], by:

$$\begin{aligned} \mathfrak{p}\_j([A\_i] \times [B\_i]) &:= A\_j \times B\_j \qquad \mathfrak{p}\_j(\varPi[A\_i]) := A\_{j+1} \\ \mathfrak{p}\_0(\mathbf{D}[A\_i]) &:= A\_0 \qquad \mathfrak{p}\_{j+1}\mathbf{D}[A\_i] := \mathfrak{p}\_j\mathbf{D}[A\_i] \times \mathfrak{p}\_j\mathbf{D}(\varPi[A\_i]) \end{aligned}$$

*Example 2.* Given an <sup>ω</sup>-sequence [A<sup>i</sup>], the first few terms of **<sup>D</sup>**[A<sup>i</sup>] are:

$$\begin{aligned} \mathfrak{p}\_0 \mathbf{D}[A\_i] &= A\_0 & \mathfrak{p}\_1 \mathbf{D}[A\_i] &= A\_0 \times A\_1 & \mathfrak{p}\_2 \mathbf{D}[A\_i] &= (A\_0 \times A\_1) \times (A\_1 \times A\_2) \\ \mathfrak{p}\_3 \mathbf{D}[A\_i] &= \left( (A\_0 \times A\_1) \times (A\_1 \times A\_2) \right) \times \left( (A\_1 \times A\_2) \times (A\_2 \times A\_3) \right) \end{aligned}$$

**Definition 9.** Given <sup>ω</sup>-sequences [A<sup>i</sup>] and [B<sup>i</sup>], a *pre*-ω-*differential map* between them, written [f<sup>i</sup>]:[A<sup>i</sup>] <sup>→</sup> [B<sup>i</sup>], is an ω-sequence [f<sup>i</sup>] such that for each <sup>j</sup>, <sup>f</sup><sup>j</sup> : <sup>p</sup>j**D**[A<sup>i</sup>] <sup>→</sup> <sup>B</sup><sup>j</sup> is a **<sup>C</sup>**-morphism.

We explain the intuition behind the derivative space **<sup>D</sup>**[A<sup>i</sup>]. Take a morphism <sup>f</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup>, and set <sup>A</sup><sup>i</sup> <sup>=</sup> <sup>Δ</sup><sup>i</sup> A (where Δ<sup>0</sup> := <sup>A</sup> and <sup>Δ</sup><sup>n</sup>+1A := Δ(Δ<sup>n</sup>A)). Since Δ distributes over product, the domain of the n-th derivative of f is <sup>p</sup>n**D**[A<sup>i</sup>]. *Notation*. Define π0 <sup>1</sup> := <sup>π</sup><sup>1</sup> and <sup>π</sup>j+1 <sup>1</sup> := <sup>π</sup>j <sup>1</sup> <sup>×</sup> <sup>π</sup>j <sup>1</sup> ; and define <sup>π</sup>(0) <sup>2</sup> := Id and π(j+1) <sup>2</sup> := <sup>π</sup><sup>2</sup> ◦ <sup>π</sup>(j) <sup>2</sup> .

**Definition 10.** Let [f<sup>i</sup>]:[A<sup>i</sup>] <sup>→</sup> [B<sup>i</sup>] and [g<sup>i</sup>]:[B<sup>i</sup>] <sup>→</sup> [C<sup>i</sup>] be pre-ω-differential maps. The *derivative sequence* **<sup>D</sup>**[f<sup>i</sup>] is the <sup>ω</sup>-sequence defined by:

$$\mathfrak{p}\_j \mathbf{D}[f\_i] := \langle f\_j \circ \pi\_1^{\langle j \rangle}, f\_{j+1} \rangle : \mathfrak{p}\_{j+1} \mathbf{D}[A\_i] \to B\_j \times B\_{j+1} \rangle$$

Using the shorthand **<sup>D</sup>**<sup>n</sup>[f<sup>i</sup>] := **<sup>D</sup>**(...(**<sup>D</sup>** <sup>n</sup> times [f<sup>i</sup>])), the *composite* [g<sup>i</sup>] ◦ [f<sup>i</sup>]:[A<sup>i</sup>] <sup>→</sup>

[C<sup>i</sup>] is the pre-ω-differential map given by <sup>p</sup><sup>j</sup> ([g<sup>i</sup>] ◦ [f<sup>i</sup>]) = <sup>g</sup><sup>j</sup> ◦ <sup>p</sup>0(**D**<sup>j</sup> [f<sup>i</sup>]). The *identity* pre-ω-differential map Id : [A<sup>i</sup>] <sup>→</sup> [A<sup>i</sup>] is defined as: <sup>p</sup><sup>j</sup> Id := <sup>π</sup>(j) <sup>2</sup> : <sup>p</sup>j**D**[A<sup>i</sup>] <sup>→</sup> <sup>A</sup><sup>j</sup> .

*Example 3.* Consider <sup>ω</sup>-sequences [f<sup>i</sup>] and [g<sup>i</sup>] as above. Then:

$$\begin{aligned} \mathfrak{p}\_{0}\mathbf{D}[f\_{i}] &= \langle f\_{0}\circ\pi\_{1}^{\langle 0\rangle}, f\_{1}\rangle & \mathfrak{p}\_{1}\mathbf{D}[f\_{i}] &= \langle f\_{1}\circ\pi\_{1}^{\langle 1\rangle}, f\_{2}\rangle \\ \mathfrak{p}\_{0}\mathbf{D}^{2}[f\_{i}] &= \langle \langle f\_{0}\circ\pi\_{1}^{\langle 0\rangle}, f\_{1}\rangle\circ\pi\_{1}, \langle f\_{1}\circ\pi\_{1}^{\langle 1\rangle}, f\_{2}\rangle\rangle \\ \mathfrak{p}\_{1}\mathbf{D}^{2}[f\_{i}] &= \langle \langle f\_{1}\circ\pi\_{1}^{\langle 1\rangle}, f\_{2}\rangle\circ\pi\_{1}^{\langle 1\rangle}, \langle f\_{2}\circ\pi\_{1}^{\langle 2\rangle}, f\_{3}\rangle\rangle \\ \mathfrak{p}\_{0}\mathbf{D}^{3}[f\_{i}] &= \langle \mathfrak{p}\_{0}\mathbf{D}^{2}[f\_{i}]\circ\pi\_{1}^{\langle 0\rangle}, \langle\langle f\_{1}\circ\pi\_{1}^{\langle 1\rangle}, f\_{2}\rangle\circ\pi\_{1}^{\langle 1\rangle}, \langle f\_{2}\circ\pi\_{1}^{\langle 2\rangle}, f\_{3}\rangle\rangle \end{aligned}$$

It follows that the first few terms of the composite [gi] ◦ [fi] are:

$$\mathfrak{p}\_0([g\_i]\circ[f\_i]) = g\_0\circ f\_0 \qquad \mathfrak{p}\_1([g\_i]\circ[f\_i]) = g\_1\circ\langle f\_0\circ\pi\_1^{\langle0\rangle}, f\_1\rangle$$

$$\mathfrak{p}\_2([g\_i]\circ[f\_i]) = g\_2\circ\langle\langle f\_0\circ\pi\_1, f\_1\rangle\circ\pi\_1^{\langle0\rangle}, \langle f\_1\circ\pi\_1^{\langle1\rangle}, f\_2\rangle\rangle$$

Notice that these correspond to iterations of the chain rule, assuming <sup>f</sup>i+1 <sup>=</sup> ∂f<sup>i</sup> and <sup>g</sup>i+1 <sup>=</sup> ∂gi.

**Proposition 3.** *For any pre-*ω*-differential map* [f<sup>i</sup>]*,* Id ◦ [f<sup>i</sup>]=[f<sup>i</sup>] ◦ Id = [f<sup>i</sup>]*.*

**Proposition 4.** *Composition of pre-*ω*-differential maps is associative: given pre-*ω*-differential maps* [f<sup>i</sup>]:[A<sup>i</sup>] <sup>→</sup> [B<sup>i</sup>]*,* [g<sup>i</sup>]:[B<sup>i</sup>] <sup>→</sup> [C<sup>i</sup>] *and* [h<sup>i</sup>]:[C<sup>i</sup>] <sup>→</sup> [D<sup>i</sup>]*, then for all* <sup>n</sup> <sup>≥</sup> <sup>0</sup>*,* <sup>h</sup><sup>n</sup> ◦ <sup>p</sup>0**D**<sup>n</sup>([g<sup>i</sup>] ◦ [f<sup>i</sup>]) = (h<sup>n</sup> ◦ <sup>p</sup>0**D**<sup>n</sup>[g<sup>i</sup>]) ◦ <sup>p</sup>0**D**<sup>n</sup>[f<sup>i</sup>].

**Definition 11.** Given pre-ω-differential maps [f<sup>i</sup>]:[A<sup>i</sup>] <sup>→</sup> [B<sup>i</sup>], [g<sup>i</sup>]:[A<sup>i</sup>] <sup>→</sup> [C<sup>i</sup>], the *pairing* [f<sup>i</sup>], [g<sup>i</sup>] : [A<sup>i</sup>] <sup>→</sup> [B<sup>i</sup>] <sup>×</sup> [C<sup>i</sup>] is the pre-ω-differential map defined by: p<sup>j</sup> [f<sup>i</sup>], [g<sup>i</sup>] <sup>=</sup> <sup>f</sup><sup>j</sup> , g<sup>j</sup> . Define pre-ω-differential maps <sup>π</sup>**<sup>1</sup>** := [π**<sup>1</sup>**i] : [A<sup>i</sup>] <sup>×</sup> [B<sup>i</sup>] <sup>→</sup> [A<sup>i</sup>] by <sup>p</sup><sup>j</sup> [π**<sup>1</sup>**i] := <sup>π</sup><sup>1</sup> ◦ <sup>π</sup>(j) <sup>2</sup> , and <sup>π</sup>**<sup>2</sup>** := [π**<sup>2</sup>**i]:[A<sup>i</sup>] <sup>×</sup> [B<sup>i</sup>] <sup>→</sup> [B<sup>i</sup>] by <sup>p</sup><sup>j</sup> [π**<sup>2</sup>**i] := <sup>π</sup><sup>2</sup> ◦ <sup>π</sup>(j) <sup>2</sup> .

**Definition 12.** <sup>A</sup> *pre*-ω-*change action* on a cartesian category **<sup>C</sup>** is a quadruple A = ([A<sup>i</sup>], [<sup>⊕</sup> Ai], [+ Ai], [0<sup>A</sup> <sup>i</sup> ]) where [A<sup>i</sup>] is an <sup>ω</sup>-sequence of **<sup>C</sup>**-objects, and for each j <sup>≥</sup> 0, <sup>⊕</sup> Aj and <sup>+</sup> Aj are <sup>ω</sup>-sequences, satisfying


4. Δ(A, j ) := (A<sup>j</sup> , A<sup>j</sup>+1, <sup>p</sup>0<sup>⊕</sup> Aj , <sup>p</sup>0<sup>+</sup> Aj , <sup>0</sup><sup>A</sup> <sup>j</sup> ) is a change action in **C**.

We extend the left-shift operation to pre-ω-change actions by defining ΠA := (Π[A<sup>i</sup>], Π[<sup>⊕</sup> Ai], Π[+ Ai], [0<sup>A</sup> <sup>i</sup> ]). Then we define the change actions **<sup>D</sup>**(A, j ) inductively by: **<sup>D</sup>**(A, 0) := Δ(A, 0) and **<sup>D</sup>**(A, j + 1) := Δ(A, j ) <sup>×</sup> Δ(ΠA, j ). Notice that the carrier object of **<sup>D</sup>**(A, j ) is the j-th element of the ω-sequence **<sup>D</sup>**[A<sup>i</sup>].

**Definition 13.** Given pre-ω-change actions A and B (using the preceding notation), a pre-ω-differential map [f<sup>i</sup>]:[A<sup>i</sup>] <sup>→</sup> [B<sup>i</sup>] is ω-*differential* if, for each j <sup>≥</sup> 0, (f<sup>j</sup> , f<sup>j</sup>+1) is a differential map from the change action **<sup>D</sup>**(A, j ) to Δ(B, j ). Whenever [f<sup>i</sup>] is an ω-differential map, we write f : A <sup>→</sup> B .

We say that a pre-ω-change action A is an ω-*change action* if, for each i <sup>≥</sup> 0, ⊕ Ai and <sup>+</sup> Ai are <sup>ω</sup>-differential maps.<sup>8</sup>

<sup>8</sup> It is important to sequence the definitions appropriately. Notice that we only define ω-differential maps once there is a notion of pre-ω-change action, but pre-ω-change actions need pre-ω-differential maps to make sense of the monoidal sum <sup>+</sup>*<sup>j</sup>* and action ⊕*j* .

*Remark 4.* The reason for requiring each ⊕ Ai and <sup>+</sup> Ai in an <sup>ω</sup>-change object <sup>A</sup> to be ω-differential is so that A is *internally* a change action in CActω(**C**) (see Definition 15).

**Lemma 13.** *Let* f : A <sup>→</sup> B *and* g : B <sup>→</sup> C *be* ω*-differential maps. Qua pre*ω*-differential maps, their composite* [gi] ◦ [fi] *is* <sup>ω</sup>*-differential. Setting* <sup>g</sup> ◦ f := [gi]◦[fi] : A <sup>→</sup> C *, it follows that composition of* ω*-differential maps is associative.*

**Lemma 14.** *For any* ω*-change action* A *, the pre-*ω*-differential map* Id : [A<sup>i</sup>] <sup>→</sup> [A<sup>i</sup>] *is* ω*-differential. Hence* Id := Id : A <sup>→</sup> A *satisfies the identity laws.*

**Definition 14.** Given ω-change actions A and B , we define the *product* ω*change action* by: (A <sup>×</sup> B := ([A<sup>i</sup> <sup>×</sup> <sup>B</sup><sup>i</sup>], [<sup>⊕</sup> <sup>i</sup>], [+ <sup>i</sup>], [0 <sup>i</sup>]) where

1. ⊕ <sup>j</sup> := ⊕ Aj , <sup>⊕</sup> Bj ◦ π<sup>11</sup>, π<sup>12</sup>, π<sup>21</sup>, π<sup>22</sup> 2. + <sup>j</sup> := + Aj , <sup>+</sup> Bj ◦ π<sup>11</sup>, π<sup>12</sup>, π<sup>21</sup>, π<sup>22</sup> 3. 0 <sup>j</sup> := 0A <sup>j</sup> , <sup>0</sup><sup>B</sup> <sup>j</sup>

Notice that Δ(A <sup>×</sup> B, j ) := (A<sup>j</sup> <sup>×</sup> <sup>B</sup><sup>j</sup> , A<sup>j</sup>+1 <sup>×</sup> <sup>B</sup><sup>j</sup>+1, <sup>p</sup>0<sup>⊕</sup> <sup>j</sup> , <sup>p</sup>0<sup>+</sup> j , 0 <sup>j</sup> ) is a change action in **C** by construction.

**Lemma 15.** *The pre-*ω*-differential maps* <sup>π</sup>**<sup>1</sup>**, π**<sup>2</sup>** *are* <sup>ω</sup>*-differential. Moreover, for any* ω*-differential maps* f : A <sup>→</sup> B *and* g : A <sup>→</sup> C *, the map* f, g := [f<sup>i</sup>], [g<sup>i</sup>] *is* <sup>ω</sup>*-differential, satisfying* <sup>π</sup>**<sup>1</sup>** ◦ f, g <sup>=</sup> f *and* <sup>π</sup>**<sup>2</sup>** ◦ f, g <sup>=</sup> g. 

**Definition 15.** Define the functor CAct<sup>ω</sup> : **Cat**<sup>×</sup> → **Cat**<sup>×</sup> as follows.


**Theorem 8.** *The category* CActω(**C**) *is cartesian, with product given in Definition 14. Moreover if* **C** *is closed and has countable limits,* CActω(**C**) *is cartesian closed.*

**Theorem 9.** *The category* CActω(**C**) *is equipped with a canonical change action model:* γ : CActω(**C**) <sup>→</sup> CAct(CActω(**C**))*.*

**Theorem 10 (Relativised final coalgebra).** *Let* **C** *be a change action model. The canonical change action model* γ : CActω(**C**) <sup>→</sup> CAct(CActω(**C**)) *is a relativised*<sup>9</sup> *final coalgebra of* (CAct, ε)*.*

*i.e. for all change action models on* **<sup>C</sup>***,* α : **C** → CAct(**C**)*, there is a unique coalgebra homomorphism* <sup>α</sup><sup>ω</sup> : **<sup>C</sup>** <sup>→</sup> CActω(**C**)*, as witnessed by the commuting diagram:* **C** CAct(**C**) CActω(**C**) CAct(CActω(**C**)) <sup>∃</sup> ! <sup>α</sup><sup>ω</sup> α CAct(αω) γ

<sup>9</sup> Here CAct is restricted to the full subcategory of **Cat***<sup>×</sup>* with **<sup>C</sup>** as the only object.

*Proof.* We first exhibit the functor <sup>α</sup><sup>ω</sup> : **<sup>C</sup>** <sup>→</sup> CActω(**C**).

Take a **<sup>C</sup>**-morphism f : A <sup>→</sup> B. We define the ω-differential map αω(f) := f : A <sup>→</sup> B , where A := [Ai], [<sup>⊕</sup> <sup>i</sup>], [+ <sup>i</sup>], [0i] is the ω-change action determined by A under *iterative actions of* <sup>α</sup>. I.e. for each <sup>i</sup> <sup>≥</sup> 0: <sup>A</sup><sup>i</sup> := <sup>Δ</sup><sup>i</sup> A (by abuse of notation, we write ΔA to mean the carrier object of the monoid of the internal change action α(A ), for any **<sup>C</sup>**-object A ); <sup>⊕</sup> <sup>j</sup> : <sup>Π</sup><sup>j</sup> [Ai]×Πj+1[Ai] <sup>→</sup> <sup>Π</sup><sup>j</sup> [Ai] is specified by: <sup>p</sup>k<sup>⊕</sup> <sup>j</sup> is the monoid action morphism of <sup>α</sup>(Aj+k); <sup>+</sup> <sup>j</sup> : <sup>Π</sup>j+1[Ai]×Πj+1[Ai] <sup>→</sup> Πj+1[Ai] is specified by: <sup>p</sup>k<sup>⊕</sup> <sup>j</sup> is the monoid sum morphism of <sup>α</sup>(Aj+k); 0<sup>i</sup> is the zero object of α(A<sup>i</sup>).

The ω-sequence f := [f<sup>i</sup>] is defined by induction: <sup>f</sup><sup>0</sup> := <sup>f</sup>; assume <sup>f</sup><sup>n</sup> : (**D**A )<sup>n</sup> <sup>→</sup> <sup>B</sup><sup>n</sup> is defined and suppose <sup>α</sup>(f<sup>n</sup>)=(f<sup>n</sup>, ∂f<sup>n</sup>) then define <sup>f</sup><sup>n</sup>+1 := ∂f<sup>n</sup>.

To see that the diagram commutes, notice that γ(f )=(f,Π f ) and CAct(α<sup>ω</sup>) maps α(f)=(f, ∂f) to (f, ∂f); then observe that Πf <sup>=</sup> ∂f follows from the construction of f .

 Finally to see that the functor <sup>α</sup><sup>ω</sup> is unique, consider the **<sup>C</sup>**-morphisms <sup>∂</sup><sup>n</sup><sup>f</sup> (n = 0, <sup>1</sup>, <sup>2</sup>, ···) where α(∂<sup>n</sup>f)=(∂<sup>n</sup>f, ∂<sup>n</sup>+1f). Suppose <sup>β</sup> : **<sup>C</sup>** <sup>→</sup> CActω(**C**) is another homomorphism. Thanks to the commuting diagram, we must have <sup>Π</sup><sup>n</sup>β(f) = <sup>β</sup>(∂<sup>n</sup>f), and so, in particular (β(f))<sup>n</sup> = (Π<sup>n</sup>β(f))<sup>0</sup> = (β(∂<sup>n</sup>f))<sup>0</sup> <sup>=</sup> ∂<sup>n</sup>f, for each n <sup>≥</sup> 0. Thus f <sup>=</sup> β(f) as desired.

Intuitively any change action model on **C** is always a "subset" of the change action model on CActω(**C**).

**Theorem 11.** *The category* CActω(**C**) *is the limit in* **Cat**<sup>×</sup> *of the diagram.*

#### **7 Related Work, Future Directions and Conclusions**

The present work directly expands upon work by the authors and others in [2], where the notion of change action was developed in the context of the incremental evaluation of Datalog programs. This work generalizes some results in [2] and addresses two significant questions that had been left open, namely: how to construct cartesian closed categories of change actions and how to formalize higher-order derivatives.

Our work is also closely related to Cockett, Seely and Cruttwell's work on cartesian differential categories [3,4,7] and Cruttwell's more recent work on generalised cartesian differential categories [10]. Both cartesian differential categories and change action models aim to provide a setting for differentiation, and the construction of ω-change actions resembles the Fa`a di Bruno construction [8,10] (especially its recent reformulation by Lemay [20]) which, given an arbitrary category **C**, builds a cofree cartesian differential category for it. The main difference between these two settings lies in the specific axioms required (change action models are significantly weaker: see Remark 2).

In this sense, the derivative condition is close to the Kock-Lawvere axiom from synthetic differential geometry [18,19], which has provided much of the driving intuition behind this work, and making this connection precise is the subject of ongoing research.

In a different direction, the simplicity of products and exponentials in closed change action models (see Theorem 5) suggests that there should be a reasonable calculus for change action models. Exploring such a calculus and its connections to the differential λ-calculus [11] could lead to practical applications to languages for incremental computation or higher-order automatic differentiation [16].

In conclusion, change actions and change action models constitute a new setting for reasoning about differentiation that is able to unify "discrete" and "continuous" models, as well as higher-order functions. Change actions are remarkably well-behaved and show tantalising connections with geometry and 2-categories. We believe that most ad hoc notions of derivatives found in disparate subjects can be elegantly integrated into the framework of change action models. We therefore expect any further work in this area to have the potential of benefiting these notions of derivatives.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Coalgebra Learning via Duality**

Simone Barlocco<sup>1</sup>, Clemens Kupke1(B), and Jurriaan Rot<sup>2</sup>

<sup>1</sup> University of Strathclyde, Glasgow, UK {simone.barlocco,clemens.kupke}@strath.ac.uk <sup>2</sup> Radboud University, Nijmegen, Netherlands j.rot@cs.ru.nl

**Abstract.** Automata learning is a popular technique for inferring minimal automata through membership and equivalence queries. In this paper, we generalise learning to the theory of coalgebras. The approach relies on the use of logical formulas as tests, based on a dual adjunction between states and logical theories. This allows us to learn, e.g., labelled transition systems, using Hennessy-Milner logic. Our main contribution is an abstract learning algorithm, together with a proof of correctness and termination.

#### **1 Introduction**

In recent years, automata learning is applied with considerable success to infer models of systems and in order to analyse and verify them. Most current approaches to active automata learning are ultimately based on the original algorithm due to Angluin [4], although numerous improvements have been made, in practical performance and in extending the techniques to different models [30].

Our aim is to move from automata to *coalgebras* [14,26], providing a generalisation of learning to a wide range of state-based systems. The key insight underlying our work is that dual adjunctions connecting coalgebras and tailormade logical languages [12,19,21,22,26] allow us to devise a generic learning algorithm for coalgebras that is parametric in the type of system under consideration. Our approach gives rise to a fundamental distinction between *states* of the learned system and *tests*, modelled as logical formulas. This distinction is blurred in the classical DFA algorithm, where tests are also used to specify the (reachable) states. It is precisely the distinction between tests and states which allows us to move beyond classical automata, and use, for instance, Hennessy-Milner logic to learn bisimilarity quotients of labelled transition systems.

To present learning via duality we need to introduce new notions and refine existing ones. First, in the setting of coalgebraic modal logic, we introduce the new notion of *sub-formula closed* collections of formulas, generalising suffixclosed sets of words in Angluin's algorithm (Sect. 4). Second, we import the abstract notion of *base* of a functor from [8], which allows us to speak about

c The Author(s) 2019

C. Kupke—Partially supported by EPSRC grant EP/N015843/1.

M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 62–79, 2019. https://doi.org/10.1007/978-3-030-17127-8\_4

'successor states' (Sect. 5). In particular, the base allows us to characterise *reachability* of coalgebras in a clear and concise way. This yields a canonical procedure for computing the reachable part from a given initial state in a coalgebra, thus generalising the notion of a generated subframe from modal logic.

We then rephrase *coalgebra learning* as the problem of inferring a coalgebra which is reachable, minimal and which cannot be distinguished from the original coalgebra held by the teacher using tests. This requires suitably adapting the computation of the reachable part to incorporate tests, and only learn 'up to logical equivalence'. We formulate the notion of *closed table*, and an associated procedure to close tables. With all these notions in place, we can finally define our abstract algorithm for coalgebra learning, together with a proof of correctness and termination (Sect. 6). Overall, we consider this correctness and termination proof as the main contribution of the paper; other contributions are the computation of reachability via the base and the notion of sub-formula closedness. At a more conceptual level, our paper shows how states and tests interact in automata learning, by rephrasing it in the context of a dual adjunction connecting coalgebra (systems) and algebra (logical theories). As such, we provide a new foundation of learning state-based systems.

*Related Work.* The idea that tests in the learning algorithm should be formulas of a distinct logical language was proposed first in [6]. However, the work in *loc.cit.* is quite ad-hoc, confined to Boolean-valued modal logics, and did not explicitly use duality. This paper is a significant improvement: the dual adjunction framework and the definition of the base [8] enables us to present a description of Angluin's algorithm in purely categorical terms, including a proof of correctness and, crucially, termination. Our abstract notion of logic also enables us to recover *exactly* the standard DFA algorithm (where tests are words) and the algorithm for learning Mealy machines (where test are many-valued), something that is not possible in [6] where tests are modal formulas. Closely related to our work is also the line of research initiated by [15] and followed up within the CALF project [11–13] which applies ideas from category theory to automata learning. Our approach is orthogonal to CALF: the latter focuses on learning a general version of *automata*, whereas our work is geared towards learning bisimilarity quotients of state-based transition systems. While CALF lends itself to studying automata in a large variety of base categories, our work thus far is concerned with varying the type of transition structures.

#### **2 Learning by Example**

The aim of this section is twofold: (i) to remind the reader of the key elements of Angluin's L<sup>∗</sup> algorithm [4] and (ii) to motivate and outline our generalisation.

In the classical L<sup>∗</sup> algorithm, the learner tries to learn a regular language L over some alphabet A or, equivalently, a DFA A accepting that language. Learning proceeds by asking queries to a teacher who has access to this automaton. *Membership queries* allow the learner to test whether a given word is in the language, and *equivalence queries* to test whether the correct DFA has been learned already. The algorithm constructs so-called tables (S, E) where S, E ⊆ A<sup>∗</sup> are the rows and columns of the table, respectively. The value at position (s, e) of the table is the answer to the membership query "se ∈ L?".

Words play a double role: On the one hand, a word w ∈ S represents the state which is reached when reading w at the initial state. On the other hand, the set E represents the set of membership queries that the learner is asking about the states in S. A table is *closed* if for all w ∈ S and all a ∈ A either wa ∈ S or there is a state v ∈ S such that wa is equivalent to v w.r.t. membership queries of words in E. If a table is not closed we extend S by adding words of the form wa for w ∈ S and a ∈ A. Once it is closed, one can define a *conjecture*, <sup>1</sup> i.e., a DFA with states in S. The learner now asks the teacher whether the conjecture is correct. If it is, the algorithm terminates. Otherwise the teacher provides a *counterexample*: a word on which the conjecture is incorrect. The table is now extended using the counterexample. As a result, the table is not closed anymore and the algorithm continues again by closing the table.

Our version of L<sup>∗</sup> introduces some key conceptual differences: tables are pairs (S, Ψ) such that S (set of rows) is a selection of states of A and Ψ (set of columns) is a collection of tests/formulas. Membership queries become checks of tests in Ψ at states in S and equivalence queries verify whether or not the learned structure is logically equivalent to the original one. A table (S, Ψ) is closed if for all successors x of elements of S there exists an x ∈ S such that x and x are equivalent w.r.t. formulas in Ψ. The clear distinction between states and tests in our algorithm means that counterexamples are formulas that have to be added to Ψ. Crucially, the move from words to formulas allows us to use the rich theory of coalgebra and coalgebraic logic to devise a generic algorithm.

We consider two examples within our generic framework: classical DFAs, yielding essentially the L<sup>∗</sup> algorithm, and labelled transition systems, which is to the best of our knowledge not covered by standard automata learning algorithms.

For the DFA case, let L = {u ∈ {a, b}<sup>∗</sup> | number of a's mod 3 = 0} and assume that the teacher uses the following (infinite) automaton describing L:

As outlined above, the learner starts to construct tables (S, Ψ) where S is a selection of states of the automaton and Ψ are formulas. For DFAs we will see (Example 1) that our formulas are just words in {a, b}<sup>∗</sup>. Our starting table is ({q0}, ∅), i.e., we select the initial state and do not check any logical properties. This table is trivially closed, as all states are equivalent w.r.t. ∅. The first conjecture is the automaton consisting of one accepting state q<sup>0</sup> with a- and b-loops, whose language is {a, b}<sup>∗</sup>. This is incorrect and the teacher provides, e.g., aa as counterexample. The resulting table is ({q0}, {ε, a, aa}) where the

<sup>1</sup> The algorithm additionally requires *consistency*, but this is not needed if counterexamples are added to E. This idea goes back to [22].

second component was generated by closing {aa} under suffixes. Suffix closedness features both in the original L<sup>∗</sup> algorithm and in our framework (Sect. 4). The table ({q0}, {ε, a, aa}) is not closed as q1, the a-successor of q0, does not accept ε whereas q<sup>0</sup> does. Therefore we extend the table to ({q0, q1}, {ε, a, aa}). Note that, unlike in the classical setting, exploring successors of already selected states cannot be achieved by appending letters to words, but we need to *locally* employ the transition structure on the automaton A instead. A similar argument shows that we need to extend the table further to ({q0, q1, q2}, {ε, a, aa}) which is closed. This leads to the (correct) conjecture depicted on the right below. The acceptance condition and transition structure has been read off from the original automaton, where the transition from q<sup>2</sup> to q<sup>0</sup> is obtained by realising that q2's successor q<sup>3</sup> is represented by the equivalent state q<sup>0</sup> ∈ S.

A key feature of our work is that the L <sup>b</sup> <sup>∗</sup> algorithm can be systematically generalised to new settings, in particular, to the learning of bisimulation quotients of transition systems. Consider the following labelled transition system (LTS). We would like

to learn its minimal representation, i.e., its quotient modulo bisimulation.

Our setting allows us to choose a suitable logical language. For LTSs, the language consists of the formulas of standard multi-modal logic (cf. Example 3). The

semantics is as usual where a φ holds at a state if it has an a-successor that makes φ true.

As above, the algorithm constructs tables, starting with (S = {x0}, Ψ = ∅). The table is closed, so the first conjecture is a single state with an a-loop with no proposition letter true (note that x<sup>0</sup> has no b or c successor and no proposition is true at x0). It is, however, easy for the teacher to find a counterexample. For example, the formula a b is true at the root of the original LTS but false in the conjecture. We add the counterexample and all its subformulas to Ψ and obtain a new table ({x0}, Ψ } with Ψ = {a b ,b , }. Now, the table is not closed, as x<sup>0</sup> has successor x<sup>1</sup> that satisfies b whereas x<sup>0</sup> does not satisfy b . Therefore we add x<sup>1</sup> to the table to obtain ({x0, x1}, Ψ ). Similar arguments will lead to the closed table ({x0, x1, x3, x4}, Ψ ) which also yields the correct conjecture. Note that the state x<sup>2</sup> does not get added to the table as it is equivalent to x<sup>1</sup> and thus already represented. This demonstrates a remarkable fact: we computed the bisimulation quotient of the LTS without inspecting the (infinite) right-hand side of the LTS.

Another important example that fits smoothly into our framework is the wellknown variant of Angluin's algorithm to learn Mealy machines (Example 2). Thanks to our general notion of logic, our framework allows to use an intuitive language, where a formula is simply an input word w whose truth value at a state x is the observed output after entering w at x. This is in contrast to [6] where formulas had to be Boolean valued. Multi-valued logics fit naturally in our setting; this is expected to be useful to deal with systems with quantitative information.

# **3 Preliminaries**

The general learning algorithm in this paper is based on the theory of *coalgebras*, which provides an abstract framework for representing state-based transition systems. In what follows we assume that the reader is familiar with basic notions of category theory and coalgebras [14,26]. We briefly recall the notion of pointed coalgebra, modelling a coalgebra with an initial state. Let C be a category with a terminal object 1 and let B : C→C be a functor. A pointed B-coalgebra is a triple (X, γ, x0) where X ∈ C and γ : X → BX and x<sup>0</sup> : 1 → X, specifying the coalgebra structure and the point ("initial state") of the coalgebra, respectively.

*Coalgebraic Modal Logic.* Modal logics are used to describe properties of statebased systems, modelled here as coalgebras. The close relationship between coalgebras and their logics is described elegantly via dual adjunctions [18,20,21,24].

Our basic setting consists of two categories C, D connected by functors P, Q forming a dual adjunction P Q: C words, we have a natural bijection C(X, QΔ) ∼= D(Δ, P X) for X ∈ C, Δ ∈ D. Moreover, we assume two functors, B : C→C, L: D→D, see (1). The functor L represents the syntax of the (modalities in the) logic: assuming that L has an initial algebra α: LΦ → Φ we think of Φ as the col-

lection of formulas, or tests. In this logical perspective, the functor P maps an object X of C to the collection of predicates and the functor Q maps an object Δ of D to the collection QΔ of Δ-theories.

The connection between coalgebras and their logics is specified via a natural transformation δ : LP ⇒ P B, sometimes referred to as the one-step semantics of the logic. The δ is used to define the semantics of the logic on a B-coalgebra (X, γ) by initiality, as in (2). Furthermore, using the bijective correspondence of the dual adjunction between P and Q, the map - corresponds to a map *th*<sup>γ</sup> : <sup>X</sup> <sup>→</sup> QΦ that we will refer to as the theory map of (X, γ).

The theory map can be expressed directly via a universal property, by making use of the so-called *mate* <sup>δ</sup> : BQ <sup>⇒</sup> QL of the one-step semantics δ (cf. [18,24]). More precisely, we have

<sup>δ</sup> <sup>=</sup> QLε ◦ QδQ ◦ ηBQ, where η, ε are the unit and counit of the adjunction. Then *th*<sup>γ</sup> : <sup>X</sup> <sup>→</sup> QΦ is the unique morphism making (3) commute.

$$\text{nions}, \; \mathcal{D}, \mathcal{C} \to \mathcal{C}, \mathcal{L}, \; \nu \to \nu,$$

$$\mathcal{C} \underbrace{\mathcal{C} \mathcal{L}}\_{\mathcal{Q}} \underbrace{\perp^{P}}\_{\mathcal{Q}} \underbrace{\mathcal{D}^{\mathsf{op}} \parrow \mathcal{D}^{\mathsf{I}}}\_{L} \quad (1)$$

<sup>D</sup>op. In other

$$\begin{array}{c} L\Phi - \stackrel{L[\text{.}]}{\rightleftharpoons} LPX \xrightarrow{\delta\_X} PBX\\ \alpha\\ \Phi - \stackrel{\exists![\text{.}]}{\quad- -- -- -- -- \text{.}} PX \end{array} \quad (2)$$

BX B*th*<sup>γ</sup> ❴ ❴ ❴BQΦ δ <sup>Φ</sup> QLΦ γ (3)

$$\begin{array}{rcl} \uparrow & & \uparrow Q \alpha & \downarrow \psi\\ X - - - - - \stackrel{\exists! th^{\gamma}}{-} - - - \stackrel{\forall}{\rightharpoonup} Q \Phi \end{array}$$

*Example 1.* Let C = D = Set, P = Q = 2<sup>−</sup> the contravariant power set functor, <sup>B</sup> = 2 × −<sup>A</sup> and <sup>L</sup> =1+ <sup>A</sup> × −. In this case <sup>B</sup>-coalgebras can be thought of as deterministic automata with input alphabet A (e.g., [25]). It is well-known that the initial L-algebra is Φ = A<sup>∗</sup> with structure α = [ε, cons]: 1 + A × A<sup>∗</sup> → A<sup>∗</sup> where ε selects the empty word and cons maps a pair (a, w) ∈ A×A<sup>∗</sup> to the word aw ∈ A∗, i.e., in this example our tests are words with the intuitive meaning that a test succeeds if the word is accepted by the given automaton. For X ∈ C, the X-component of the (one-step) semantics δ : LP ⇒ P B is defined as follows: <sup>δ</sup>X(∗) = {(i, f) <sup>∈</sup> <sup>2</sup> <sup>×</sup> <sup>X</sup><sup>A</sup> <sup>|</sup> <sup>i</sup> = 1}, and <sup>δ</sup>X(a, U) = {(i, f) <sup>∈</sup> <sup>2</sup> <sup>×</sup> <sup>X</sup><sup>A</sup> <sup>|</sup> <sup>f</sup>(a) <sup>∈</sup> <sup>U</sup>}. It is matter of routine checking that the semantics of tests in Φ on a B-coalgebra (X, γ) is as follows: we have ε = {x ∈ X | π1(γ(x)) = 1} and aw = {x ∈ X | π2(γ(x))(a) ∈ w}, where π<sup>1</sup> and π<sup>2</sup> are the projection maps. The theory map *th*<sup>γ</sup> sends a state to the language accepted by that state in the usual way.

*Example 2.* Again let C = D = Set and consider the functors P = Q = O−, <sup>B</sup> = (<sup>O</sup> × −)<sup>A</sup> and <sup>L</sup> <sup>=</sup> <sup>A</sup>×(1 +−), where <sup>A</sup> and <sup>O</sup> are fixed sets, thought of as input and output alphabet, respectively. Then B-coalgebras are Mealy machines and the initial L-algebra is given by the set A<sup>+</sup> of finite non-empty words over <sup>A</sup>. For <sup>X</sup> ∈ C, the one-step semantics <sup>δ</sup><sup>X</sup> : <sup>A</sup> <sup>×</sup> (1 + <sup>O</sup><sup>X</sup>) <sup>→</sup> <sup>O</sup>BX is defined by δX(a, inl(∗)) = λf.π1(f(a)) and δX(a, inr(g)) = λf.g(π2(f(a))). Concretely, formulas are words in <sup>A</sup><sup>+</sup>; the (O-valued) semantics of <sup>w</sup> <sup>∈</sup> <sup>A</sup><sup>+</sup> at state <sup>x</sup> is the output o ∈ O that is produced after processing the input w from state x.

*Example 3.* Let C = Set and D = BA, where the latter denotes the category of Boolean algebras. Again P = 2−, but this time 2<sup>X</sup> is interpreted as a Boolean algebra. The functor Q maps a Boolean algebra to the collection of ultrafilters over it [7]. Furthermore <sup>B</sup> = (P−)<sup>A</sup> where <sup>P</sup> denotes covariant power set and A a set of actions. Coalgebras for this functor correspond to labelled transition systems, where a state has a set of successors that depends on the action/input from A. The dual functor L: BA → BA is defined as LY := FBA({a y | a ∈ A, y ∈ Y })/ ≡ where FBA : Set → BA denotes the free Boolean algebra functor and where, roughly speaking, ≡ is the congruence generated from the axioms a⊥ ≡ ⊥ and a(y<sup>1</sup> ∨ y2) ≡ a(y1) ∨ a(y2) for each a ∈ A. This is explained in more detail in [21]. The initial algebra for this functor is the so-called Lindenbaum-Tarski algebra [7] of modal formulas (φ ::=⊥| φ ∨ φ | ¬φ | a φ) quotiented by logical equivalence. The definition of an appropriate δ can be found in, e.g., [21]—the semantics - of a formula then amounts to the standard one [7].

Different types of probabilistic transition systems also fit into the dual adjunction framework, see, e.g, [17].

*Subobjects and Intersection-Preserving Functors.* We denote by Sub(X) the collection of subobjects of an object X ∈ C. Let ≤ be the order on subobjects s: S X, s : S X given by s ≤ s iff there is m: S → S s.t. s = s ◦m. The *intersection* -J X of a family J = {s<sup>i</sup> : S<sup>i</sup> → X}<sup>i</sup>∈<sup>I</sup> is defined as the greatest lower bound w.r.t. the order ≤. In a complete category, it can be computed by (wide) pullback. We denote the maps in the limiting cone by x<sup>i</sup> : -J Si.

For a functor B : C→D, we say B *preserves (wide) intersections* if it preserves these wide pullbacks, i.e., if (B( - J), {Bxi}i∈<sup>I</sup> ) is the pullback of {Bs<sup>i</sup> : BS<sup>i</sup> → BX}i∈<sup>I</sup> . By [2, Lemma 3.53] (building on [29]), *finitary* functors on Set 'almost' preserve wide intersections: for every such functor B there is a functor B which preserves wide intersections and agrees with B on all non-empty sets. Finally, if B preserves intersections, then it preserves monos.

*Minimality Notions.* The algorithm that we will describe in this paper learns a minimal and reachable representation of an object. The intuitive notions of minimality and reachability are formalised as follows.

**Definition 4.** *We call a* B*-coalgebra* (X, γ) minimal w.r.t. logical equivalence *if the theory map th*<sup>γ</sup> : <sup>X</sup> <sup>→</sup> QΦ *is a monomorphism.*

**Definition 5.** *We call a pointed* B*-coalgebra* (X, γ, x0) reachable *if for any subobject* s: S → X *and* s<sup>0</sup> : 1 → S *with* x<sup>0</sup> = s ◦ s0*: if* S *is a subcoalgebra of* (X, γ) *then* s *is an isomorphism.*

For expressive logics [27], behavioural equivalence coincides with logical equivalence. Hence, in that case, our algorithm learns a "well-pointed coalgebra" in the terminology of [2], i.e., a pointed coalgebra that is reachable and minimal w.r.t. behavioural equivalence. All logics appearing in this paper are expressive.

*Assumption on* C *and Factorisation System.* Throughout the paper we will assume that C is a complete and well-powered category. Well-powered means that for each X ∈ C the collection Sub(X) of subobjects of a given object forms a set. Our assumptions imply [10, Proposition 4.4.3] that every morphism f in C factors uniquely (up to isomorphism) as f = m ◦ e with m a mono and e a strong epi.

Recall that an epimorphism e: X → Y is strong if for every commutative square in (4) where the bottom arrow is a monomorphism, there exists a unique diagonal morphism d such that the entire diagram commutes.

$$\begin{array}{c} X \stackrel{e}{\longrightarrow} \begin{array}{c} Y \\ \downarrow \\ U \stackrel{\cdot'}{\longmapsto} \end{array} \begin{array}{c} Y \\ \downarrow \\ Z \end{array} \end{array} \tag{4}$$

# **4 Subformula Closed Collections of Formulas**

Our learning algorithm will construct conjectures that are "partially" correct, i.e., correct with respect to a subobject of the collection of all formulas/tests. Recall this collection of all tests are formalised in our setting as the initial L-algebra (Φ, α: LΦ → Φ). To define a notion of partial correctness we need to consider subobjects of Φ to which we can restrict the theory map. This is formalised via the notion of "subformula closed" subobject of Φ. The definition of such subobjects is based on the notion of *recursive coalgebra*. For L: D→D an endofunctor, a coalgebra f : X → LX is called *recursive* if for every L-algebra g : LY → Y there is a unique 'coalgebra-to-algebra' map g† making (5) commute.

**Definition 6.** *A subobject* j : Ψ → Φ *is called a* subformula closed collection *(of formulas) if there is a unique* L*-coalgebra structure* σ : Ψ → LΨ *such that* (Ψ,σ) *is a recursive* L*-coalgebra and* j *is the (necessarily unique) coalgebra-to-algebra map from* (Ψ,σ) *to the initial algebra* (Φ, α)*.*

*Remark 7.* The uniqueness of σ in Definition 6 is implied if L preserves monomorphisms. This is the case in our examples. The notion of recursive coalgebra goes back to [23,28]. The paper [1] contains a claim that the first item of our definition of subformula closed collection is implied by the second one if L preserves preimages. In our examples both properties of (Ψ,σ) are verified directly, rather than by relying on general categorical results.

*Example 8.* In the setting of Example 1, where the initial L-algebra is based on the set A<sup>∗</sup> of words over the set (of inputs) A, a subset Ψ ⊆ A<sup>∗</sup> is subformulaclosed if it is suffix-closed, i.e., if for all aw ∈ Ψ we have w ∈ Ψ as well.

*Example 9.* In the setting that <sup>B</sup> = (P−)<sup>A</sup> for some set of actions <sup>A</sup>, <sup>C</sup> <sup>=</sup> Set and D = BA, the logic is given as a functor L on Boolean algebras as discussed in Example 3. As a subformula closed collection is an object in Ψ, we are not simply dealing with a set of formulas, but with a Boolean algebra. The connection to the standard notion of being closed under taking subformulas in modal logic [7] can be sketched as follows: given a set Δ of modal formulas that is closed under taking subformulas, we define a Boolean algebra Ψ<sup>Δ</sup> ⊆ Φ as the smallest Boolean subalgebra of <sup>Φ</sup> that is generated by the set <sup>Δ</sup><sup>ˆ</sup> <sup>=</sup> {[φ]<sup>Φ</sup> <sup>|</sup> <sup>φ</sup> <sup>∈</sup> <sup>Δ</sup>} where for a formula φ we let [φ]<sup>Φ</sup> ∈ Φ denote its equivalence class in Φ.

It is then not difficult to define a suitable σ : Ψ<sup>Δ</sup> → LΨΔ. As Ψ<sup>Δ</sup> is generated by closing Δˆ under Boolean operations, any two states x1, x<sup>2</sup> in a given coalgebra (X, γ) satisfy (∀b ∈ ΨΔ.x<sup>1</sup> ∈ b ⇔ x<sup>2</sup> ∈ b) iff <sup>∀</sup><sup>b</sup> <sup>∈</sup> Δ.x <sup>ˆ</sup> <sup>1</sup> <sup>∈</sup> b ⇔ x<sup>2</sup> ∈ b . In other words, equivalence w.r.t. Ψ<sup>Δ</sup> coincides with equivalence w.r.t. the *set* of formulas Δ. This explains why in the concrete algorithm, we do not deal with Boolean algebras explicitly, but with subformula closed sets of formulas instead.

The key property of subformula closed collections Ψ is that we can restrict our attention to the so-called Ψ-theory map. Intuitively, subformula closedness is what allows us to define this theory map inductively.

$$\begin{array}{c c} X \xleftarrow{\colon\,^{th}\vec{\upnu}} \neg Q\Psi\\ \gamma\\ BX \xleftarrow{\:\,^{B}th^{\gamma}\_{\Psi}} BQ\Psi \xrightarrow{\delta^{\flat}\_{\Psi}} QL\Psi \end{array} (6)$$

**Lemma 10.** *Let* <sup>Ψ</sup> <sup>j</sup> Φ *be a sub-formula closed collection, with coalgebra structure* <sup>σ</sup> : <sup>Ψ</sup> <sup>→</sup> LΨ*. Then th*<sup>γ</sup> <sup>Ψ</sup> <sup>=</sup> Qj ◦ *th*<sup>γ</sup> <sup>Φ</sup> *is the unique map making* (6) *commute. We call th*<sup>γ</sup> <sup>Ψ</sup> *the* Ψ*-theory map, and omit the* Ψ *if it is clear from the context.*

# **5 Reachability and the Base**

In this section, we define the notion of *base* of an endofunctor, taken from [8]. This allows us to speak about the (direct) successors of states in a coalgebra, and about reachability, which are essential ingredients of the learning algorithm.

**Definition 11.** *Let* B : C→C *be an endofunctor. We say* B has a base *if for every arrow* f : X → BY *there exist* g : X → BZ *and* m: Z Y *with* m *a monomorphism such that* f = Bm◦ g*, and for any pair* g : X → BZ , m : Z Y *with* Bm ◦g = f *and* m *a monomorphism there is a unique arrow* h: Z → Z *such that* Bh ◦ g = g *and* m ◦ h = m*, see Diagram* (7)*. We call* (Z, g, m) *the (*B*)-*base *of the morphism* f*.*

We sometimes refer to m: Z Y as the base of f, omitting the g when it is irrelevant, or clear from the context. Note that the terminology 'the' base is justified, as it is easily seen to be unique up to isomorphism.

For example, let B : Set → Set, BX = <sup>2</sup> <sup>×</sup> <sup>X</sup><sup>A</sup>. The base of a map <sup>f</sup> : <sup>X</sup> <sup>→</sup> BY is given by <sup>m</sup>: <sup>Z</sup> <sup>Y</sup> , where <sup>Z</sup> <sup>=</sup> {(π<sup>2</sup> ◦ f)(x)(a) | x ∈ X, a ∈ A}, and m is the inclusion. The associated g : X → BZ is the corestriction of f to BZ.

For <sup>B</sup> = (P−)<sup>A</sup> : Set <sup>→</sup> Set, the <sup>B</sup>-base of <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> is given by the inclusion m: Z Y , where Z = {y ∈ Y | ∃x ∈ X, ∃a ∈ A s.t. y ∈ f(x)(a)}.

**Proposition 12.** *Suppose* C *is complete and well-powered, and* B : C→C *preserves (wide) intersections. Then* B *has a base.*

If C is a locally presentable category, then it is complete and well-powered [3, Remark 1.56]. Hence, in that case, any functor B : C→C which preserves intersections has a base. The following lemma will be useful in proofs.

**Lemma 13.** *Let* B : C→C *be a functor that has a base and that preserves preimages. Let* f : S → BX *and* h: X → Y *be morphisms, let* (Z, g, m) *be the base of* f *and let* e: Z → W, m : W → Y *be the (strong epi, mono)-factorisation of* h ◦ m*. Then* (W, Be ◦ g,m ) *is the base of* Bh ◦ f*.*

The B-base provides an elegant way to relate reachability within a coalgebra to a monotone operator on the (complete) lattice of subobjects of the carrier of the coalgebra. Moreover, we will see that the least subcoalgebra that contains a given subobject of the carrier can be obtained via a standard least fixpoint construction. Finally, we will introduce the notion of prefix closed subobject of a coalgebra, generalising the prefix closedness condition from Angluin's algorithm.

By our assumption on C at the end of Sect. 3, the collection of subobjects (Sub(X), ≤) ordered as usual (cf. Section 3) forms a complete lattice. Recall that the meet on Sub(X) (intersection) is defined via pullbacks. In categories with coproducts, the join s<sup>1</sup> ∨s<sup>2</sup> of subobjects s1, s<sup>2</sup> ∈ Sub(X) is defined as the mono part of the factorisation of the map [s1, s2]: S1+S<sup>2</sup> → X, i.e., [s1, s2]=(s1∨s2)◦e for a strong epi e. In Set, this amounts to taking the union of subsets.

For a binary join s<sup>1</sup> ∨ s<sup>2</sup> we denote by *inl*<sup>∨</sup> : S<sup>1</sup> → (S1∨S2) and *inr* <sup>∨</sup> : S<sup>2</sup> → (S1∨S2) the embeddings that exist by s<sup>i</sup> ≤ s<sup>1</sup> ∨ s<sup>2</sup> for i = {1, 2}. Let us now define the key operator of this section.

$$\mathop{S}\_{g} \xrightarrow{s \longrightarrow} X \\ \mathop{\begin{pmatrix} \\ \\ \\ B\varGamma(S) \end{pmatrix} \xrightarrow{B\varGamma^{B}\_{\gamma}(s)} BX} (8)$$

**Definition 14.** *Let* B *be a functor that has a base,* s: S X *a subobject of some* <sup>X</sup> ∈ C *and let* (X, γ) *be a* <sup>B</sup>*-coalgebra. Let* (Γ(S), g, Γ <sup>B</sup> <sup>γ</sup> (s)) *be the* B*-base of* γ ◦s*, see Diagram* (8)*. Whenever* B *and* γ *are clear from the context, we write* Γ(s) *instead of* Γ <sup>B</sup> <sup>γ</sup> (s)*.*

**Lemma 15.** *Let* B : C→C *be a functor with a base and let* (X, γ) *be a* B*coalgebra. The operator* Γ : Sub(X) → Sub(X) *defined by* s → Γ(s) *is monotone.*

Intuitively, Γ computes for a given set of states S the set of "immediate successors", i.e., the set of states that can be reached by applying γ to an element of S. We will see that pre-fixpoints of Γ correspond to subcoalgebras. Furthermore, Γ is the key to formulate our notion of closed table in the learning algorithm.

**Proposition 16.** *Let* s: S X *be a subobject and* (X, γ) ∈ Coalg(B) *for* X ∈ C *and* B : C→C *a functor that has a base. Then* s *is a subcoalgebra of* (X, γ) *if and only if* Γ(s) ≤ s*. Consequently, the collection of subcoalgebras of a given* B*-coalgebra forms a complete lattice.*

Using this connection, reachability of a pointed coalgebra (Definition 5) can be expressed in terms of the least fixpoint lfp of an operator defined in terms of Γ.

**Theorem 17.** *Let* B : C→C *be a functor that has a base. A pointed* B*-coalgebra* (X, γ, x0) *is reachable iff* X ∼= lfp(Γ ∨ x0) *(isomorphic as subobjects of* X*, i.e., equal).*

This justifies defining the reachable part from an initial state x<sup>0</sup> : 1 X as the least fixpoint of the monotone operator Γ ∨ x0. Standard means of computing the least fixpoint by iterating this operator then give us a way to compute this subcoalgebra. Further, Γ provides a way to generalise the notion of "prefixed closedness" from Angluin's L<sup>∗</sup> algorithm to our categorical setting.

**Definition 18.** *Let* s0, s ∈ Sub(X) *for some* X ∈ C *and let* (X, γ) *be a* B*coalgebra. We call* s s0*-*prefix closed w.r.t. γ *if* s = <sup>n</sup> <sup>i</sup>=0 s<sup>i</sup> *for some* n ≥ 0 *and a collection* {s<sup>i</sup> | i = 1,...,n} *with* s<sup>j</sup>+1 ≤ Γ( j <sup>i</sup>=0 si) *for all* j *with* 0 ≤ j<n*.*

### **6 Learning Algorithm**

We define a general learning algorithm for B-coalgebras. First, we describe the setting, in general and slightly informal terms. The teacher has a pointed B-coalgebra (X, γ, s0). Our task is to 'learn' a pointed B-coalgebra (S, γ, ˆ sˆ0) s.t.:


The first point means that the learned coalgebra is 'correct', that is, it agrees with the coalgebra of the teacher on all possible tests from the initial state. For instance, in case of deterministic automata and their logic in Example 1, this just means that the language of the learned automaton is the correct one.

In the learning game, we are only provided limited access to the coalgebra γ : X → BX. Concretely, the teacher gives us:


The first three points correspond respectively to the standard notions of membership query ('filling in' the table with rows S and columns Ψ), equivalence query and counterexample generation. The last point, about the base, is more unusual: it does not occur in the standard algorithm, since there a canonical choice of (X, γ) is used, which allows to represent next states in a fixed manner. It is required in our setting of an arbitrary coalgebra (X, γ).

In the remainder of this section, we describe the abstract learning algorithm and its correctness. First, we describe the basic ingredients needed for the algorithm: tables, closedness, counterexamples and a procedure to close a given table (Sect. 6.1). Based on these notions, the actual algorithm is presented (Sect. 6.2), followed by proofs of correctness and termination (Sect. 6.3).

#### **Assumption 19.** *Throughout this section, we assume*


*Moreover, we assume a pointed* B*-coalgebra* (X, γ, s0)*.*

*Remark 20.* We restrict to C = Set, but see it as a key contribution to state the algorithm in categorical terms: the assumptions cover a wide class of functors on Set, which is the main direction of generalisation. Further, the categorical approach will enable future generalisations. The assumptions on the category C are: it is complete, well-powered and satisfies that for all (strong) epis q : S → S ∈ C and all monos i: S → S such that q ◦ i is mono there is a morphism <sup>q</sup>−<sup>1</sup> : <sup>S</sup> <sup>→</sup> <sup>S</sup> such that (i) <sup>q</sup> ◦ <sup>q</sup>−<sup>1</sup> <sup>=</sup> id and <sup>q</sup>−<sup>1</sup> ◦ <sup>q</sup> ◦ <sup>i</sup> <sup>=</sup> <sup>i</sup>.

#### **6.1 Tables and Counterexamples**

**Definition 21.** *<sup>A</sup>* table *is a pair* (<sup>S</sup> <sup>s</sup> X, Ψ i Φ) *consisting of a subobject* s *of* X *and a subformula-closed subobject* i *of* Φ*.*

To make the notation a bit lighter, we sometimes refer to a table by (S, Ψ), using s and i respectively to refer to the actual subobjects. The pair (S, Ψ) represents 'rows' and 'columns' respectively, in the table; the 'elements' of the table are given abstractly by the map *th*<sup>γ</sup> <sup>Ψ</sup> ◦ s. In particular, if C = D = Set and Q = 2−, then this is a map <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>Ψ</sup> , assigning a Boolean value to every pair of a row (state) and a column (formula).

For the definition of closedness, we use the operator Γ(S) from Definition 14, which characterises the successors of a subobject S X.

**Definition 22.** *A table* (S, Ψ) *is* closed *if there exists a map* k : Γ(S) → S *such that Diagram* (9) *commutes. A table* (S, Ψ) *is* sharp *if the composite map* S X QΨ <sup>s</sup> *th*<sup>γ</sup> *is monic.*

Thus, a table (S, Ψ) is closed if all the successors of states (elements of Γ(S)) are already represented in S, up to equivalence w.r.t. the tests in Ψ. In other terms, the rows corresponding to successors of existing rows are already in the table. Sharpness amounts to minimality w.r.t. logical equivalence: every row has a unique value. The latter will be an invariant of the algorithm (Theorem 32).

A *conjecture* is a coalgebra on S, which is not quite a subcoalgebra of X: instead, it is a subcoalgebra 'up to equivalence w.r.t. Ψ', that is, the successors agree up to logical equivalence.

$$\begin{array}{c} S \longmapsto \xrightarrow{s} X \xrightarrow{\gamma} \xrightarrow{\gamma} BX\\ \uparrow \\ BS \xrightarrow[Bs]{} \xrightarrow[Bs]{} BX \xrightarrow[Bth^{\gamma}]{} BQ\Psi \end{array} (10)$$

**Definition 23.** *Let* (S, Ψ) *be a table. A coalgebra structure* γˆ : S → BS *is called a* conjecture *(for* (S, Ψ)*) if Diagram* (10) *commutes.*

It is essential to be able to construct a conjecture from a closed table. The following, stronger result is a variation of Proposition 16.

**Theorem 24.** *A sharp table is closed iff there exists a conjecture for it. Moreover, if the table is sharp and* B *preserves monos, then this conjecture is unique.* Our goal is to learn a pointed coalgebra which is correct w.r.t. all formulas. To this aim we ensure correctness w.r.t. an increasing sequence of subformula closed collections Ψ.

**Definition 25.** *Let* (S, Ψ) *be a table, and let* (S, γ, ˆ sˆ0) *be a pointed* B*-coalgebra on* S*. We say* (S, γ, ˆ sˆ0) *is* correct *w.r.t.* Ψ *if Diagram* (11) *commutes.*

All conjectures constructed during the learning algorithm will be correct w.r.t. the subformula closed collection Ψ of formulas under consideration.

**Lemma 26.** *Suppose* (S, Ψ) *is closed, and* γˆ *is a conjecture. Then th*<sup>γ</sup> <sup>Ψ</sup> ◦ s = *th*<sup>γ</sup><sup>ˆ</sup> <sup>Ψ</sup> : S → QΨ*. If* sˆ<sup>0</sup> : 1 → S *satisfies* s◦sˆ<sup>0</sup> = s<sup>0</sup> *then* (S, γ, ˆ sˆ0) *is correct w.r.t.* Ψ*.*

We next define the crucial notion of *counterexample* to a pointed coalgebra: a subobject Ψ of Ψ on which it is 'incorrect'.

**Definition 27.** *Let* (S, Ψ) *be a table, and let* (S, γ, ˆ sˆ0) *be a pointed* B*-coalgebra on* S*. Let* Ψ *be a subformula closed subobject of* Φ*, such that* Ψ *is a subcoalgebra of* Ψ *. We say* Ψ *is a* counterexample (for (S, γ, ˆ sˆ0)*,* extending Ψ*) if* (S, γ, ˆ sˆ0) *is* not *correct w.r.t.* Ψ *.*

The following elementary lemma states that if there are no more counterexamples for a coalgebra, then it is correct w.r.t. the object Φ of all formulas.

**Lemma 28.** *Let* (S, Ψ) *be a table, and let* (S, γ, ˆ sˆ0) *be a pointed* B*-coalgebra on* S*. Suppose that there are no counterexamples for* (S, γ, ˆ sˆ0) *extending* Ψ*. Then* (S, γ, ˆ sˆ0) *is correct w.r.t.* Φ*.*

The following describes, for a given table, how to extend it with the successors (in X) of all states in S. As we will see below, by repeatedly applying this construction, one eventually obtains a closed table.

**Definition 29.** *Let* (S, Ψ) *be a sharp table. Let* (S, q, r) *be the (strong epi, mono)-factorisation of the map th*<sup>γ</sup> ◦ (<sup>s</sup> <sup>∨</sup> <sup>Γ</sup>(s))*, as in the diagram:*

*We define* close(S, Ψ) := {s: <sup>S</sup> <sup>X</sup> <sup>|</sup> *th*<sup>γ</sup> ◦ <sup>s</sup> <sup>=</sup> r, s <sup>≤</sup> <sup>s</sup> <sup>≤</sup> <sup>s</sup> <sup>∨</sup> <sup>Γ</sup>(s)}*. For each* s ∈ close(S, Ψ) *we have* s ≤ s *and thus* s = s ◦ κ *for some* κ: S → S*.*

**Lemma 30.** *In Definition 29, for each* s ∈ close(S, Ψ)*, we have* κ = q ◦ *inl*∨*.*

We will refer to κ = q ◦ *inl*<sup>∨</sup> as the connecting map from s to s.

**Lemma 31.** *In Definition 29, if there exists* <sup>q</sup>−<sup>1</sup> : <sup>S</sup> <sup>→</sup> <sup>S</sup> <sup>∨</sup> <sup>Γ</sup>(S) *such that* <sup>q</sup> ◦ <sup>q</sup>−<sup>1</sup> <sup>=</sup> id *and* <sup>q</sup>−<sup>1</sup> ◦ <sup>q</sup> ◦ *inl*<sup>∨</sup> <sup>=</sup> *inl*∨*, then* close(S, Ψ) *is non-empty.*

By our assumptions, the hypothesis of Lemma 31 is satisfied (Remark 20), hence close(S, Ψ) is non-empty. It is precisely (and only) at this point that we need the strong condition about existence of right inverses to epimorphisms.

#### **6.2 The Algorithm**

Having defined closedness, counterexamples and a procedure for closing a table, we are ready to define the abstract algorithm. In the algorithm, the teacher has access to a function counter((S, γ, ˆ sˆ0), Ψ), which returns the set of all counterexamples (extending Ψ) for the conjecture (S, γ, ˆ sˆ0). If this set is empty, the coalgebra (S, γ, ˆ sˆ0) is correct (see Lemma 28), otherwise the teacher picks one of its elements Ψ . We also make use of close(S, Ψ), as given in Definition 29.

```
Algorithm 1. Abstract learning algorithm
```

```
1: (S s
       X) ← (1 s0
                  X)
2: ˆs0 ← id1
3: Ψ ← 0
4: while true do
5: while (S s
                X, Ψ) is not closed do
6: let (S s
                X) ∈ close(S, Ψ), with connecting map κ: S  S
7: (S s
             X) ← (S s
                        X)
8: ˆs0 ← κ ◦ sˆ0
9: end while
10: let (S, γˆ) be a conjecture for (S, Ψ)
11: if counter((S, γ, ˆ sˆ0), Ψ) = ∅ then
12: return (S, γ, ˆ sˆ0)
13: else
14: Ψ ← Ψ-
                for some Ψ-
                            ∈ counter((S, γ, ˆ sˆ0), Ψ)
15: end if
16: end while
```
The algorithm takes as input the coalgebra (X, γ, s0) (which we fixed throughout this section). In every iteration of the outside loop, the table is first closed by repeatedly applying the procedure in Definition 29. Then, if the conjecture corresponding to the closed table is correct, the algorithm returns it (Line 12). Otherwise, a counterexample is chosen (Line 14), and the algorithm continues.

#### **6.3 Correctness and Termination**

Correctness is stated in Theorem 33. It relies on establishing loop invariants:

**Theorem 32.** *The following is an invariant of both loops in Algorithm 1 in Sect. 6.2: 1.* (S, Ψ) *is sharp, 2.* s ◦ sˆ<sup>0</sup> = s0*, and 3.* s *is* s0*-prefix closed w.r.t.* γ*.*

**Theorem 33.** *If Algorithm 1 in Sect. 6.2 terminates, then it returns a pointed coalgebra* (S, γ, ˆ sˆ0) *which is minimal w.r.t. logical equivalence, reachable and correct w.r.t.* Φ*.*

In our termination arguments, we have to make an assumption about the coalgebra which is to be learned. It does not need to be finite itself, but it should be finite up to logical equivalence—in the case of deterministic automata, for instance, this means the teacher has a (possibly infinite) automaton representing a regular language. To speak about this precisely, let Ψ be a subobject of Φ. We take a (strong epi, mono)-factorisation of the theory map, i.e., *th*<sup>γ</sup> <sup>Ψ</sup> = <sup>X</sup> <sup>e</sup><sup>Ψ</sup> |X|<sup>Ψ</sup> <sup>m</sup><sup>Ψ</sup> QΨ for some strong epi e and mono m. We call the object |X|<sup>Ψ</sup> in the middle the Ψ*-logical quotient*. For the termination result (Theorem 37), |X|<sup>Φ</sup> is assumed to have finitely many quotients and subobjects, which just amounts to finiteness, in Set.

We start with termination of the inner while loop (Corollary 36). This relies on two results: first, that once the connecting map κ is an iso, the table is closed, and second, that—under a suitable assumption on the coalgebra (X, γ)—during execution of the inner while loop, the map κ will eventually be an iso.

**Theorem 34.** *Let* (S, Ψ) *be a sharp table, let* S ∈ close(S, Ψ) *and let* κ: S → S *be the connecting map. If* κ *is an isomorphism, then* (S, Ψ) *is closed.*

**Lemma 35.** *Consider a sequence of sharp tables* (S<sup>i</sup> si X, Ψ)<sup>i</sup>∈<sup>N</sup> *such that* s<sup>i</sup>+1 ∈ close(Si, Ψ) *for all* i*. Moreover, let* (κ<sup>i</sup> : S<sup>i</sup> → S<sup>i</sup>+1)<sup>i</sup>∈<sup>N</sup> *be the connecting maps (Definition 29). If the logical quotient* |X|<sup>Φ</sup> *of* X *has finitely many subobjects, then* <sup>κ</sup><sup>i</sup> *is an isomorphism for some* <sup>i</sup> <sup>∈</sup> <sup>N</sup>*.*

**Corollary 36.** *If the* Φ*-logical quotient* |X|<sup>Φ</sup> *has finitely many subobjects, then the inner while loop of Algorithm 1 terminates.*

For the outer loop, we assume that |X|<sup>Φ</sup> has finitely many quotients, ensuring that every sequence of counterexamples proposed by the teacher is finite.

**Theorem 37.** *If the* Φ*-logical quotient* |X|<sup>Φ</sup> *has finitely many quotients and finitely many subobjects, then Algorithm 1 terminates.*

# **7 Future Work**

We showed how duality plays a natural role in automata learning, through the central connection between states and tests. Based on this foundation, we proved correctness and termination of an abstract algorithm for coalgebra learning. The generality is not so much in the base category (which, for the algorithm, we take to be Set) but rather in the functor used; we only require a few mild conditions on the functor, and make no assumptions about its shape. The approach is thus considered *coalgebra learning* rather than automata learning.

Returning to automata, an interesting direction is to extend the present work to cover learning of, e.g., non-deterministic or alternating automata [5,9] for a regular language. This would require explicitly handling branching in the type of coalgebra. One promising direction would be to incorporate the forgetful logics of [19], which are defined within the same framework of coalgebraic logic as the current work. It is not difficult to define in this setting what it means for a table to be closed 'up to the branching part', stating, e.g., that even though the table is not closed, all the successors of rows are present as combinations of other rows.

Another approach would be to integrate monads into our framework, which are also used to handle branching within the theory of coalgebras [16]. It is an intriguing question whether the current approach, which allows to move beyond automata-like examples, can be combined with the CALF framework [13], which is very far in handling branching occurring in various kinds of automata.

**Acknowledgments.** We are grateful to Joshua Moerman, Nick Bezhanishvili, Gerco van Heerdt, Aleks Kissinger and Stefan Milius for valuable discussions and suggestions.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Tight Worst-Case Bounds for Polynomial Loop Programs**

Amir M. Ben-Amram<sup>1</sup> and Geoff W. Hamilton2(B)

<sup>1</sup> School of Computer Science, Tel-Aviv Academic College, Tel Aviv, Israel amirben@mta.ac.il

<sup>2</sup> School of Computing, Dublin City University, Dublin 9, Ireland hamilton@computing.dcu.ie

**Abstract.** In 2008, Ben-Amram, Jones and Kristiansen showed that for a simple programming language—representing non-deterministic imperative programs with bounded loops, and arithmetics limited to addition and multiplication—it is possible to decide precisely whether a program has certain growth-rate properties, in particular whether a computed value, or the program's running time, has a polynomial growth rate.

A natural and intriguing problem was to improve the precision of the information obtained. This paper shows how to obtain asymptoticallytight *multivariate* polynomial bounds for this class of programs. This is a complete solution: whenever a polynomial bound exists it will be found.

### **1 Introduction**

One of the most important properties we would like to know about programs is their *resource usage*, i.e., the amount of resources (such as time, memory and energy) required for their execution. This information is useful during development, when performance bugs and security vulnerabilities exploiting performance issues can be avoided. It is also particularly relevant for mobile applications, where resources are limited, and for cloud services, where resource usage is a major cost factor.

In the literature, a lot of different "cost analysis" problems (also called "resource bound analysis," etc.) have been studied (e.g. [1,11,13,18,19,24,26, 27]); several of them may be grouped under the following general definition. The *countable resource problem* asks about the maximum usage of a "resource" that accumulates during execution, and which one can explicitly count, by instrumenting the program with an accumulator variable and instructions to increment it where necessary. For example, we can estimate the *execution time* of a program by counting certain "basic steps". Another example is counting the number of visits to designated program locations. Realistic problems of this type include bounding the number of calls to specific functions, perhaps to system services; the number of I/O operations; number of accesses to memory, etc. The consumption of resources such as *energy* suits our problem formulation as long as such explicit bookkeeping is possible (we have to assume that the increments, if not constant, are given by a monotone polynomial expression).

c The Author(s) 2019

M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 80–97, 2019. https://doi.org/10.1007/978-3-030-17127-8\_5

In this paper we solve the *bound analysis problem* for a particular class of programs, defined in [7]. The bound analysis problem is to find symbolic bounds on the maximal possible value of an integer variable at the end of the program, in terms of some integer-valued variables that appear in the initial state of a computation. Thus, a solution to this problem might be used for any of the resource-bound analyses above. In this work we focus on values that grow polynomially (in the sense of being bounded by a polynomial), and our goal is to find polynomial bounds that are tight, in the sense of being precise up to a constant factor.

The programs we study are expressed by the so-called *core language*. It is imperative, including bounded loops, non-deterministic branches and restricted arithmetic expressions; the syntax is shown in Fig. 1. Semantics is explained and motivated below, but is largely intuitive; see also the illustrative example in Fig. 2. In 2008, it was proved [7] that for this language it is decidable whether a computed result is polynomially bounded or not. This makes the language an attractive target for work on the problem of computing tight bounds. However, for the past ten years there has been no improvement on [7]. We now present an algorithm to compute, for every program in the language, and every variable in the program which has a polynomial upper bound (in terms of input values), a tight polynomial bound on its largest attainable value (informally, "the worstcase value") as a function of the input values. The bound is guaranteed to be tight up to a multiplicative constant factor but constants are left implicit (for example a bound quadratic in n will always be represented as n<sup>2</sup>). The algorithm could be extended to compute upper and lower bounds with explicit constant factors, but choosing to ignore coefficients simplifies the algorithm considerably. In fact, we have striven for a simple, comprehensible algorithm, and we believe that the algorithm we present is sufficiently simple that, beyond being comprehensible, offers insight into the structure of computations in this model.

#### **1.1 The Core Language**

*Data.* It is convenient to assume (without loss of generality) that the only type of data is non-negative integers. Note that a realistic (not "core") program may include many statements that manipulate non-integer data that are not relevant to loop control—so in a complexity analysis, we may be able to abstract these parts away and still analyze the variables of interest. In other cases, it is

$$\begin{array}{lcl}\texttt{X}\in\texttt{Variable} & ::= & \texttt{X}\_{1}\mid\texttt{X}\_{2}\mid\texttt{X}\_{3}\mid\ldots\mid\texttt{X}\_{n}\\\texttt{E}\in\texttt{Expression} & ::= & \texttt{X}\mid\texttt{E}\,+\texttt{E}\mid\texttt{E}\,\*\texttt{E}\\\texttt{C}\in\texttt{Common} & ::= & \texttt{skip}\mid\texttt{X}\colon\texttt{E}\,\texttt{C}\,\texttt{C}\_{1}\,\texttt{C}\_{2}\mid\texttt{loop}\,\texttt{E}\,\{\texttt{C}\}\\ & & \mid\quad \texttt{choose }\texttt{C}\_{1}\text{ or }\texttt{C}\_{2}\end{array}$$

**Fig. 1.** Syntax of the core language.

possible to preprocess a program to replace complex data values with their size (or "norm"), which is the quantity of importance for loop control. Methods for this process have been widely studied in conjunction with termination and cost analysis.

*Command Semantics.* The core language is inherently non-deterministic. The choose command represents a non-deterministic choice, and can be used to abstract any concrete conditional command by simply ignoring the condition; this is necessary to ensure that our analysis problem is decidable. Note that what we ignore is branches within a loop body and not branches that implement the loop control, which we represent by a dedicated loop command. The command loop E {C} repeats C a (non-deterministic) number of times bounded by the value of E, which is evaluated just before the loop is entered. Thus, as a conservative abstraction, it may be used to model different forms of loops (for-loops, while-loops) as long as a bound on the number of iterations, as a function of the program state on loop initiation, can be determined and expressed in the language. There is an ample body of research on analysing programs to find such bounds where they are not explicitly given by the programmer; in particular, bounds can be obtained from a *ranking function* for the loop [2,3,5,6,23]. Note that the arithmetic in our language is too restricted to allow for the maintenance of counters and the creation of *while* loops, as there is no subtraction, no explicit constants and no tests. Thus, for realistic "concrete" programs which use such devices, loop-bound analysis is supposed to be performed *on the concrete program* as part of the process of abstracting it to the core language. This process is illustrated in [9, Sect. 2].

From a computability viewpoint, the use of bounded loops restricts the programs that can be represented to such that compute primitive recursive functions; this is a rich enough class to cover a lot of useful algorithms and make the analysis problem challenging. In fact, our language resembles a weakened version of Meyer and Ritchie's LOOP language [20], which computes all the primitive recursive functions, and where behavioral questions like "is the result linearly bounded" are undecidable.

```
loop X1 {
   loop X2 + X3 { choose { X3:= X1; X2:= X4 } or { X3:= X4; X2:= X1 } };
   X4:= X2 + X3
};
loop X4 { choose { X3:= X1 + X2 + X3 } or { X3:= X2; X2:= X1 } }
```
**Fig. 2.** A core-language program. loop *<sup>n</sup>* <sup>C</sup> means "do <sup>C</sup> at most *<sup>n</sup>* times."

#### **1.2 The Algorithm**

Consider the program in Fig. 2. Suppose that it is started with the values of the variables X1, X2,... being x1, x2,... . Our purpose is to bound the values of all variables at the conclusion of the program in terms of those initial values. Indeed, they are all polynomially bounded, and our algorithm provides tight bounds. For instance, it establishes that the final value of X<sup>3</sup> is tightly bounded (up to a constant factor) by max(x4(x<sup>4</sup> + x<sup>2</sup> <sup>1</sup>), x4(x<sup>2</sup> + x<sup>3</sup> + x<sup>2</sup> <sup>1</sup>)).

In fact, it produces information in a more precise form, as *a disjunction of simultaneous bounds*. This means that it generates vectors, called *multipolynomials*, that give simultaneous bounds on all variables; for example, with the program in Fig. 2, one such multi-polynomial is <sup>x</sup>1, x2, x3, x4 (this is the result of all loops taking a very early exit). This form is important in the context of a compositional analysis. To see why, suppose that we provide, for a command with variables <sup>X</sup>, <sup>Y</sup>, the bounds x, y and y, x. Then we know that the *sum* of their values is always bounded by x+y, a result that would have not been deduced had we given the bound max(x, y) on each of the variables. The difference may be critical for the success of analyzing an enclosing or subsequent command.

*Multivariate* bounds are often of interest, and perhaps require no justification, but let us point out that multivariate polynomials are necessary even if we're ultimately interested in a univariate bound, in terms of some single initial value, say n. This is, again, due to the analysis being compositional. When we analyze an internal command that uses variables X, Y,... we do not know in what possible contexts the command will be executed and how the values of these variables will be related to n.

Some highlights of our solution are as follows.


The remainder of this paper is structured as follows. In Sect. 2 we give some definitions and state our main result. In Sects. 3, 4 and 5 we present our algorithm. In Sect. 6, we outline the correctness proofs. Section 7 considers related work, and Sect. 8 concludes and discusses ideas for further work.

#### **2 Preliminaries**

In this section, we give some basic definitions, complete the presentation of our programming language and precisely state the main result.

#### **2.1 Some Notation and Terminology**

*The Language.* We remark that in our language syntax there is no special form for a "program unit"; in the text we sometimes use "program" for the subject of our analysis, yet syntactically it's just a command.

*Polynomials and Multi-polynomials.* We work throughout this article with multivariate polynomials in x1,...,x<sup>n</sup> that have non-negative integer coefficients and no variables other than x1,...,xn; when we speak of a polynomial we always mean one of this kind. Note that over the non-negative integers, such polynomials are monotonically (weakly) increasing in all variables.

The post-fix substitution operator [a/b] may be applied to any sort of expression containing a variable b, to substitute a instead; e.g., (x<sup>2</sup> + yx + y)[2z/y] = x<sup>2</sup> + 2zx + 2z.

When discussing a command, state-transition, or program trace, with a variable Xi, x<sup>i</sup> will denote, as a rule, the initial value of this variable, and x <sup>i</sup> its final value. Thus we distinguish the syntactic entity by the typewriter font. We write the polynomials manipulated by our algorithms using the variable names xi. We presume that an implementation of the algorithm represents polynomials concretely so that ordinary operations such as composition can be applied, but otherwise we do not concern ourselves much with representation.

The parameter n always refers to the number of variables in the subject program. The set [n] is {1,...,n}. For a set <sup>S</sup> an <sup>n</sup>-tuple over <sup>S</sup> is a mapping from [n] to S. The set of these tuples is denoted by S<sup>n</sup>. Throughout the paper, various natural liftings of operators to collections of objects is tacitly assumed, e.g., if <sup>S</sup> is a set of integers then <sup>S</sup> + 1 is the set {<sup>s</sup> + 1 <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup>} and <sup>S</sup> <sup>+</sup> <sup>S</sup> is {<sup>s</sup> <sup>+</sup> <sup>t</sup> <sup>|</sup> s, t <sup>∈</sup> <sup>S</sup>}. We use such lifting with sets as well as with tuples. If <sup>S</sup> is ordered, we extend the ordering to S<sup>n</sup> by comparing tuples element-wise (this leads to a partial order, in general, e.g., with natural numbers, -<sup>1</sup>, <sup>3</sup> and -<sup>2</sup>, <sup>2</sup> are incomparable).

**Definition 1.** *A* polynomial transition (PT) *represents a mapping of an "input" state* **x** = <sup>x</sup>1,...,x<sup>n</sup> *to a "result" state* **<sup>x</sup>** <sup>=</sup> x 1,...,x <sup>n</sup> = **p**(*x*) *where* **p** = **<sup>p</sup>**[1],..., **<sup>p</sup>**[n] *is an* <sup>n</sup>*-tuple of polynomials. Such a* **<sup>p</sup>** *is called a* a multipolynomial (MP)*; we denote by* MPol *the set of multi-polynomials, where the number of variables* n *is fixed by context.*

Multi-polynomials are used in this work to represent the effect of a command. Various operations will be applied to MPs, mostly obvious—in particular, composition (which corresponds to sequential application of the transitions). Note that composition of multi-polynomials, **q**◦**p**, is naturally defined since **p** supplies n values for the n variables of **q** (in other words, they are composed as functions in <sup>N</sup><sup>n</sup> <sup>→</sup> <sup>N</sup><sup>n</sup>). We define *Id* to be the identity transformation, **<sup>x</sup>** <sup>=</sup> **<sup>x</sup>** (in MP notation: **p**[i] = x<sup>i</sup> for i = 1,...,n).

#### **2.2 Formal Semantics of the Core Language**

The semantics associates with every command C over variables X1,..., X<sup>n</sup> a relation [[C]] <sup>⊆</sup> <sup>N</sup><sup>n</sup> <sup>×</sup> <sup>N</sup>n. In the expression *<sup>x</sup>*[[C]]*y*, vector *<sup>x</sup>* (respectively *<sup>y</sup>*) is the store before (after) the execution of C.

The semantics of skip is the identity. The semantics of an assignment Xi:=E associates to each store *x* a new store *y* obtained by replacing the component x<sup>i</sup> by the value of the expression E when evaluated over store *x*. This is defined in the natural way (details omitted), and is denoted by [[E]]*x*. Composite commands are described by the straight-forward equations:

$$\begin{aligned} \left[\mathbb{C}\_{1}; \mathbb{C}\_{2}\right] &= \left[\mathbb{C}\_{2}\right] \diamond \left[\mathbb{C}\_{1}\right] \\ \left[\text{choose } \mathbb{C}\_{1} \text{ or } \mathbb{C}\_{2}\right] &= \left[\mathbb{C}\_{1}\right] \cup \left[\mathbb{C}\_{2}\right] \\ \left[\text{1oop } \mathbb{E}\ \{\mathbb{C}\}\right] &= \left\{ (x, y) \mid \exists i \leq \left[\mathbb{E}\right] x : x \left[\mathbb{C}\right]^{i} y \right\} \end{aligned}$$

where [[C]]<sup>i</sup> represents [[C]] ◦···◦ [[C]] (<sup>i</sup> occurrences of [[C]]); and [[C]]<sup>0</sup> <sup>=</sup> *Id*.

*Remarks.* The following two changes may enhance the applicability of the core language for simulating certain concrete programs; we include them as "options" because they do not affect the validity of our proofs.


#### **2.3 Detailed Statement of the Main Result**

The *polynomial-bound analysis problem* is to find, for any given command, which output variables are bounded by a polynomial in the input values (which are simply the values of all variables upon commencement of the program), and to bound these output values tightly (up to constant factors). The problem of *identifying* the polynomially-bounded variables is completely solved by [7]. We rely on that algorithm, which is polynomial-time, to do this for us (as further explained below).

Our main result is thus stated as follows.

**Theorem 1.** *There is an algorithm which, for a command* C*, over variables X*<sup>1</sup> *through X*n*, outputs a set* B *of multi-polynomials, such that the following hold, where PB is the set of indices* i *of variables X*<sup>i</sup> *which are polynomially bounded under* [[C]]*.*

*1. (Bounding) There is a constant* <sup>c</sup>**<sup>p</sup>** *associated with each* **<sup>p</sup>** ∈ B*, such that*

$$\forall x, y: x \|\mathsf{C}\| y \implies \exists \mathbf{p} \in \mathcal{B}. \forall i \in \mathsf{PB}. y\_i \le c\_\mathbf{p} \mathbf{p}[i](x)$$

*2. (Tightness) For every* **<sup>p</sup>** ∈ B *there are constants* <sup>d</sup>**<sup>p</sup>** <sup>&</sup>gt; <sup>0</sup>*, <sup>x</sup>*<sup>0</sup> *such that for all x* ≥ *x*<sup>0</sup> *there is a y such that*

$$x[\mathbb{C}] \mathtt{y} \text{ and } \forall i \in \mathsf{PB}. \, y\_i \ge d\_\mathbf{p} \mathbf{p}[i](\boldsymbol{x}).$$

#### **3 Analysis Algorithm: First Concepts**

The following sections describe our analysis algorithm. Naturally, the most intricate part of the analysis concerns loops. In fact we break the description into stages: first we reduce the problem of analyzing any program to that of analyzing *simple disjunctive loops*, defined next. Then, we approach the analysis of such loops, which is the main effort in this work.

**Definition 2.** *A* simple disjunctive loop (SDL) *is a finite set of PTs.*

The loop is "disjunctive" because its meaning is that in every iteration, any of the given transitions may be applied. The semantics is formalized by *traces* (Definition 4). A SDL does not specify the number of iterations; our analysis generates polynomials which depend on the number of iterations as well as the initial state. For this purpose, we now introduce τ -polynomials where τ represents the number of iterations.

**Definition 3.** τ *-polynomials are polynomials in* x1,...,x<sup>n</sup> *and* τ *.*

τ has a special status and does not have a separate component in the polynomial giving its value. If p is a τ -polynomial, then p(v1,...,vn) is the result of substituting each v<sup>i</sup> for the respective xi; and we also write p(v1,...,vn, t) for the result of substituting t for τ as well. The set of τ -polynomials in n variables (n known from context) is denoted τPol.

Multi-polynomials and polynomial transitions are formed from τ -polynomials just as previously defined and are used to represent the effect of a variable number of iterations. For example, the <sup>τ</sup> -polynomial transition x 1, x <sup>2</sup> = <sup>x</sup>1, x<sup>2</sup> <sup>+</sup> τx1 represents the effect of repeating (τ times) the assignment X2:= X<sup>2</sup> + X1. The effect of iterating the composite command: X2:= X<sup>2</sup> + X1; X3:= X<sup>3</sup> + X<sup>2</sup> has an effect described by **x** = <sup>x</sup>1, x<sup>2</sup> <sup>+</sup> τx1, x<sup>3</sup> <sup>+</sup> τx<sup>2</sup> <sup>+</sup> <sup>τ</sup> <sup>2</sup>x1 (here we already have an upper bound which is not reached precisely, but is correct up to a constant factor). We denote the set of τ -polynomial transitions by τ**MPol**. We should note that composition **<sup>q</sup>** ◦ **<sup>p</sup>** over <sup>τ</sup>**MPol** is performed by substituting **<sup>p</sup>**[i] for each occurrence of x<sup>i</sup> in **q**. Occurrences of τ are unaffected (since τ is not part of the state). We make a couple of preliminary definitions before reaching our goal which is the definition of the *simple disjunctive loop problem* (Definition 6).

**Definition 4.** *Let* S *be a set of polynomial transitions. An* (abstract) trace *over* <sup>S</sup> *is a finite sequence* **<sup>p</sup>**1; ... ; **<sup>p</sup>**|σ<sup>|</sup> *of elements of* <sup>S</sup>*. Thus* <sup>|</sup>σ<sup>|</sup> *denotes the* length *of the trace. The set of all traces is denoted* <sup>S</sup><sup>∗</sup>*. We write* [[σ]] *for the composed relation* **<sup>p</sup>**|σ<sup>|</sup> ◦···◦ **<sup>p</sup>**<sup>1</sup> *(for the empty trace,* <sup>ε</sup>*, we have* [[ε]] = *Id ).*

**Definition 5.** *Let* p(**x**) *be a (concrete or abstract)* τ *-polynomial. We write* p˙ *for the sum of* linear monomials *of* p*, namely any one of the form* ax<sup>i</sup> *with constant coefficient* a*. We write* p¨ *for the rest. Thus* p = ˙p + ¨p*.*

**Definition 6 (Simple disjunctive loop problem).** *The* simple disjunctive loop problem *is: given the set* <sup>S</sup>*, find (if possible) a finite set* <sup>B</sup> *of* <sup>τ</sup> *-polynomial transitions which* tightly bound *all traces over* S*. More precisely, we require:*

*1. (Bounding) There is a constant* <sup>c</sup>**<sup>p</sup>** <sup>&</sup>gt; <sup>0</sup> *associated with each* **<sup>p</sup>** ∈ B*, such that*

$$\forall x, y, \sigma: x[\sigma] \\
\mathbf{y} \implies \exists \mathbf{p} \in \mathcal{B}. \, y \le c\_{\mathbf{p}} \mathbf{p}(x, |\sigma|)$$

*2. (Tightness) For every* **<sup>p</sup>** ∈ B *there are constants* <sup>d</sup>**<sup>p</sup>** <sup>&</sup>gt; <sup>0</sup>*, <sup>x</sup>*<sup>0</sup> *such that for all <sup>x</sup>* <sup>≥</sup> *<sup>x</sup>*<sup>0</sup> *there are a trace* <sup>σ</sup> *and a state vector <sup>y</sup> such that*

$$x \lbrack \sigma \rbrack y \land y \geq \dot{\mathbf{p}}(x, |\sigma|) + d\_{\mathbf{p}} \ddot{\mathbf{p}}(x, |\sigma|) \;.$$

Note that in the lower-bound clause (2), the linear monomials of p are not multiplied, in the left-hand side, by the coefficient d**p**; this sets, in a sense, a stricter requirement for them: if the trace maps x to x<sup>2</sup> then the bound 2x<sup>2</sup> is acceptable, but if it maps x to x, the bound 2x is not accepted. The reader may understand this technicality by considering the effect of iteration: it is important to distinguish the transition x <sup>1</sup> = x1, which can be iterated ad libitum, from the transition x <sup>1</sup> = 2x1, which produces exponential growth on iteration. Distinguishing x <sup>1</sup> = x<sup>2</sup> <sup>1</sup> from x <sup>1</sup> = 2x<sup>2</sup> <sup>1</sup> is not as important. The result set B above is sometimes called a *loop summary*. We remark that Definition 6 implies that the **max** of all these polynomials provides a "big Theta" bound for the worst-case (namely biggest) results of the loop's computation. We prefer, however, to work with sets of polynomials. Another technical remark is that c**p**, d**<sup>p</sup>** range over real numbers. However, our data and the coefficients of polynomials remain integers, it is only such comparisons that are performed with real numbers (specifically, to allow c**<sup>p</sup>** to be smaller than one).

#### **4 Reduction to Simple Disjunctive Loops**

We show how to reduce the problem of analysing core-language programs to the analysis of polynomially-bounded simple disjunctive loops.

#### **4.1 Symbolic Evaluation of Straight-Line Code**

Straight-line code consists of atomic commands—namely assignments (or skip, equivalent to X1:= X1), composed sequentially. It is obvious that symbolic evaluation of such code leads to polynomial transitions.

*Example 1.* X2:= X1; X4:= X<sup>2</sup> + X3; X1:= X<sup>2</sup> \* X<sup>3</sup> is precisely represented by the transition <sup>x</sup>1, x2, x3 = <sup>x</sup>1x3, x1, x3, x<sup>1</sup> <sup>+</sup> <sup>x</sup>3.

#### **4.2 Evaluation of Non-deterministic Choice**

Evaluation of the command choose C<sup>1</sup> or C<sup>2</sup> yields a set of possible outcomes. Hence, the result of analyzing a command will be a *set* of multi-polynomial transitions. We express this in the common notation of abstract semantics:

$$\left[\mathbb{C}\right]^S \in \wp(\mathbb{1}\mathbf{Po1})\,\mathrm{.}$$

For uniformity, we consider [[C]]<sup>S</sup> for an atomic command to be a singleton in <sup>℘</sup>(MPol) (this means that we represent a transition *<sup>x</sup>* <sup>=</sup> **<sup>p</sup>**(*x*) by {**p**}). Composition is naturally extended to sets, and the semantics of a choice command is now simply set union, so we have:

$$\begin{aligned} \begin{bmatrix} \mathbf{C}\_1; \mathbf{C}\_2 \end{bmatrix}^S &= \begin{bmatrix} \mathbf{C}\_2 \end{bmatrix}^S \circ \begin{bmatrix} \mathbf{C}\_1 \end{bmatrix}^S \\\ \begin{bmatrix} \mathbf{choosse} & \mathbf{C}\_1 \ \mathbf{or} & \mathbf{C}\_2 \end{bmatrix}^S &= \begin{bmatrix} \mathbf{C}\_1 \end{bmatrix}^S \cup \begin{bmatrix} \mathbf{C}\_2 \end{bmatrix}^S \end{aligned}$$

*Example 2.* X2:= X1; choose { X4:= X<sup>2</sup> + X<sup>3</sup> } or { X1:= X<sup>2</sup> \* X<sup>3</sup> } is represented by the set {<sup>x</sup>1, x1, x3, x<sup>1</sup> <sup>+</sup> <sup>x</sup>3, <sup>x</sup>1x3, x1, x3, x4}.

#### **4.3 Handling Loops**

The above shows that any loop-free command in our language can be precisely represented by a finite set of PTs. Consequently, the problem of analyzing *any* command is reduced to the analysis of simple disjunctive loops.

Suppose that we have an algorithm Solve that takes a simple disjunctive loop and computes tight bounds for it (see Definition 6). We use it to complete the analysis of any program by the following definition:

$$\left[\mathbf{I}\mathbf{o}\mathbf{o}\mathbf{p}\,\,\mathbf{E}\,\,\{\mathbf{C}\}\right]^S = \left(\mathbf{S}\mathbf{o}\mathbf{L}\mathbf{v}\mathbf{E}(\left[\mathbf{C}\right]^S)\right[E/\tau]\right).$$

Thus, the whole solution is constructed as an ordinary abstract interpretation, following the semantics of the language, except for procedure Solve, described below.

*Example 3.* X4:= X1; loop X<sup>4</sup> { X2:= X<sup>1</sup> + X2; X3:= X<sup>2</sup> }. The loop includes just one PT. Solving the loop yields a set L = {<sup>x</sup>1, x2, x3, x4, <sup>x</sup>1, x<sup>2</sup> <sup>+</sup> τx1, x<sup>2</sup> <sup>+</sup> τx1, x4} (the first MP accounts for zero iterations, the second covers any positive number of iterations). We can now compute the effect of the given command as

$$\begin{aligned} \mathcal{L}[x\_4/\tau] \circ [\mathbf{X}\_4 \ := \mathbf{X}\_1 \end{aligned} \begin{aligned} \mathcal{L}[\mathbf{x}\_4 \ := \mathbf{x}\_1 \end{aligned} \begin{aligned} ^S &= \mathcal{L}[x\_4/\tau] \circ \{ \langle x\_1, x\_2, x\_3, x\_1 \rangle \} \\ &= \{ \langle x\_1, x\_2, x\_3, x\_1 \rangle, \langle x\_1, x\_2 + x\_1^2, x\_2 + x\_1^2, x\_1 \rangle \} .\end{aligned} $$

The next section describes procedure Solve, and operates under the assumption that all variables are polynomially bounded in the loop. However, a loop can generate exponential growth. To cover this eventuality, we first apply the algorithm of [7] which identifies which variables are polynomially bounded. If some X<sup>i</sup> is *not* polynomially bounded we replace the ith component of all the loop transitions with x<sup>n</sup> (where we assume x<sup>n</sup> to be a dedicated, unmodified variable). Clearly, after this change, all variables are polynomially bounded; moreover, variables which are genuinely polynomial are unaffected, because they cannot depend on a super-exponential quantity (given the restricted arithmetics in our language). In reporting the results of the algorithm, we should display "super-polynomial" instead of all bounds that depend on xn.

#### **5 Simple Disjunctive Loop Analysis Algorithm**

Intuitively, evaluating loop E {C} abstractly consists of simulating any finite number of iterations, i.e., computing

$$Q\_i = \{Id\} \cup P \cup \{P \circ P\} \cup \dots \cup P^{(i)} \tag{1}$$

where <sup>P</sup> = [[C]]<sup>S</sup> <sup>∈</sup> <sup>℘</sup>(MPol). The question now is whether the sequence (1) reaches a fixed point. In fact, it often doesn't. However, it is quite easy to see that in the *multiplicative fragment* of the language, that is, where the addition operator is not used, such non-convergence is associated with exponential growth. Indeed, since there is no addition, all our polynomials are monomials with a leading coefficient of 1 (*monic monomials*)—this is easy to verify. It follows that if the sequence (1) does not converge, higher and higher exponents must appear, which indicates that some variable cannot be bounded polynomially. Taking the contrapositive, we conclude that if all variables are known to be polynomially bounded the sequence will converge. Thus we have the following easy (and not so satisfying) result:

**Observation 2.** *For a SDL that does not use addition, the sequence* Q<sup>i</sup> *as in* (1) *reaches a fixed point, and the fixed point provides tight bounds for all the polynomially-bounded variables.*

When we have addition, we find that knowing that all variables are polynomially bounded does not imply convergence of the sequence (1). An example is: loop X<sup>3</sup> { X1:= X<sup>1</sup> + X<sup>2</sup> } yielding the infinite sequence of MPs <sup>x</sup>1, x2, x3, <sup>x</sup><sup>1</sup> <sup>+</sup> <sup>x</sup>2, x2, x3, <sup>x</sup><sup>1</sup> + 2x2, x2, x3, . . . Our solution employs two means. One is the introduction of τ -polynomials, already presented. The other is a kind of *abstraction*—intuitively, ignoring the concrete values of (non-zero) coefficients. Let us first define this abstraction:

**Definition 7.** APol*, the set of abstract polynomials, consists of formal sums of distinct monomials over* x1,...,xn*, where the coefficient of every monomial included is* 1*. We extend the definition to an abstraction of* τ *-polynomials, denoted* τAPol*.*

The meaning of abstract polynomials is given by the following rules:


*Analysing a SDL.* To analyse a SDL specified by a set of MPs S, we start by computing <sup>α</sup>(S). The rest of the algorithm computes within <sup>τ</sup>AMPol. We define two operations that are combined in the analysis of loops. The first, which we call *closure*, is simply the fixed point of accumulated iterations as in the multiplicative case. It is introduced by the following two definitions.

**Definition 8 (iterated composition).** *Let* **t** *be any abstract* τ *-MP. We define* **<sup>t</sup>**•(n)*, for* <sup>n</sup> <sup>≥</sup> <sup>0</sup>*, by:*

$$\begin{aligned} \mathbf{t}^{\bullet^{(0)}} &= Id \\ \mathbf{t}^{\bullet^{(n+1)}} &= \mathbf{t} \bullet \mathbf{t}^{\bullet^{(n)}}. \end{aligned}$$

*For a set* <sup>T</sup> *of abstract* <sup>τ</sup> *-MPs, we define, for* <sup>n</sup> <sup>≥</sup> <sup>0</sup>*:*

$$\begin{aligned} T^{\bullet(0)} &= \{Id\} \\ T^{\bullet(n+1)} &= T^{\bullet(n)} \cup \bigcup\_{\mathbf{q} \in \mathcal{T}, \ \mathbf{p} \in T^{\bullet(n)}} \mathbf{q} \bullet \mathbf{p} \dots \end{aligned}$$

Note that **t**•(n) = α(γ(**t**)(n)), where **p**(n) is defined using ordinary composition.

**Definition 9 (abstract closure).** *For finite* <sup>P</sup> <sup>⊂</sup> <sup>τ</sup>AMPol*, we define:*

$$Cl(P) = \bigcup\_{i=0}^{\infty} P^{\bullet(i)}\,.$$

In the correctness proof, we argue that when all variables are polynomially bounded in a loop <sup>S</sup>, the closure of <sup>α</sup>(S) can be computed in finite time; equivalently, it equals <sup>k</sup> <sup>i</sup>=0(α(S))•(i) for some <sup>k</sup>. The argument is essentially the same as in the multiplicative case.

The second operation is called *generalization* and its role is to capture the behaviour of accumulator variables, meaning variables that grow by accumulating increments in the loop, and make explicit the dependence on the number of iterations. The identification of which additive terms in a MP should be considered as increments that accumulate is at the heart of our problem, and is greatly simplified by concentrating on idempotent AMPs.

# **Definition 10. p** <sup>∈</sup> <sup>τ</sup>AMPol *is called* idempotent *if* **<sup>p</sup>** • **<sup>p</sup>** <sup>=</sup> **<sup>p</sup>***.*

Note that this is composition in the abstract domain. So, for instance, <sup>x</sup>1, x2 is idempotent, and so is <sup>x</sup><sup>1</sup> <sup>+</sup> <sup>x</sup>2, x2, while <sup>x</sup>1x2, x2 and <sup>x</sup><sup>1</sup> <sup>+</sup> <sup>x</sup>2, x1 are not.

**Definition 11.** *For* **p** *an (abstract) multi-polynomial, we say that* x<sup>i</sup> *is* selfdependent *in* **p** *if* **p**[i] *depends on* xi*. We call a monomial self-dependent if all the variables appearing in it are.*

**Definition 12.** *We define a notational convention for* τ *-MPs. Assuming that* **p**[i] *depends on* xi*, we write*

$$\mathbf{p}[i] = x\_i + \tau \mathbf{p}[i]' + \mathbf{p}[i]'' + \mathbf{p}[i]'''',$$

*where* **p**[i] *includes all the non-self-dependent monomials of* **p**[i]*, while the selfdependent monomials (other than* xi*) are grouped into two sums:* τ**p**[i] *, including all monomials with a positive degree of* τ *, and* **p**[i] *which includes all the* τ *-free monomials.*

*Example 4.* Let **p** = <sup>x</sup><sup>1</sup> <sup>+</sup> τx<sup>2</sup> <sup>+</sup> τx<sup>3</sup> <sup>+</sup> <sup>x</sup>3x4, x3, x3, x4. The self-dependent variables are all but x2. Since x<sup>1</sup> is self-dependent, we will apply the above definition to **p**[1], so that **p**[1] = x3, **p**[1] = x3x<sup>4</sup> and **p**[1] = τx2. Note that a factor of τ is stripped in **p**[1] . Had the monomial been τ <sup>2</sup>x3, we would have **p**[1] = τx3.

**Definition 13 (generalization).** *Let* **p** *be idempotent in* τAMPol*; define* **p**<sup>τ</sup> *by*

$$\mathbf{p}^{\tau}[i] = \begin{cases} x\_i + \tau \mathbf{p}[i]^\prime + \tau \mathbf{p}[i]^\prime + \mathbf{p}[i]^{\prime\prime} & \text{if } \mathbf{p}[i] \text{ depends on } x\_i\\ \mathbf{p}[i] & \text{otherwise.} \end{cases}$$

Note that the arithmetic here is abstract (see examples below). Note also that in the term τ**p**[i] the τ is already present in **p**, while in τ**p**[i] it is added to existing monomials. In this definition, the monomials of **p**[i] are treated like those of τ**p**[i] ; however, in certain steps of the proofs we treat them differently, which is why the notation separates them.

*Example 5.* Let **p** = <sup>x</sup><sup>1</sup> <sup>+</sup> <sup>x</sup>3, x<sup>2</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>+</sup> <sup>x</sup>4, x3, x3.

Note that **<sup>p</sup>** • **<sup>p</sup>** <sup>=</sup> **<sup>p</sup>**. We have **<sup>p</sup>**<sup>τ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> <sup>+</sup> τx3, x<sup>2</sup> <sup>+</sup> τx<sup>3</sup> <sup>+</sup> <sup>x</sup>4, x3, x3. *Example 6.* Let **p** = <sup>x</sup><sup>1</sup> <sup>+</sup> τx<sup>2</sup> <sup>+</sup> τx<sup>3</sup> <sup>+</sup> τx3x4, x3, x3, x4.

Note that **<sup>p</sup>** • **<sup>p</sup>** <sup>=</sup> **<sup>p</sup>**. The self-dependent variables are all but <sup>x</sup>2.

We have **<sup>p</sup>**<sup>τ</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> <sup>+</sup> τx<sup>2</sup> <sup>+</sup> τx<sup>3</sup> <sup>+</sup> τx3x4, x3, x3, x4 <sup>=</sup> **<sup>p</sup>**.

Finally we can present the analysis of the loop command.

**Algorithm** Solve(S)

Input: S, a polynomially-bounded disjunctive simple loop Output: a set of <sup>τ</sup> -MPs which tightly approximates the effect of all <sup>S</sup>-traces.

1. Set <sup>T</sup> <sup>=</sup> <sup>α</sup>(S).

	- (a) Closure: Set T to Cl(T).
	- (b) Generalization: For all **<sup>p</sup>** <sup>∈</sup> <sup>T</sup> such that **<sup>p</sup> <sup>p</sup>** <sup>=</sup> **<sup>p</sup>**, add **<sup>p</sup>**<sup>τ</sup> to <sup>T</sup>.

*Example 7.* loop X<sup>3</sup> { X1:= X<sup>1</sup> + X2; X2:= X<sup>2</sup> + X3; X4:= X<sup>3</sup> } The body of the loop is evaluated symbolically and yields the multi-polynomial:

$$\mathbf{p} = \langle x\_1 + x\_2, \ x\_2 + x\_3, \ x\_3, \ x\_3 \rangle$$

Now, computing within AMPol,

$$\begin{aligned} \alpha(\mathbf{p})^{\bullet(2)} &= \alpha(\mathbf{p}) \bullet \alpha(\mathbf{p}) = \langle x\_1 + x\_2 + x\_3, \ x\_2 + x\_3, \ x\_3, \ x\_3 \rangle; \\ \alpha(\mathbf{p})^{\bullet(3)} &= \alpha(\mathbf{p})^{\bullet(2)}. \end{aligned}$$

Here the closure computation stops. Since α(**p**•(2)) is idempotent, we compute

$$\mathbf{q} = (\alpha(\mathbf{p})^{\bullet(2)})^\tau = \langle x\_1 + \tau x\_2 + \tau x\_3, \ x\_2 + \tau x\_3, \ x\_3, \ x\_3 \rangle$$

and applying closure again, we obtain some additional results:

$$\begin{array}{lcl} \mathbf{q}\bullet\alpha(\mathbf{p}) &=& \langle x\_1 + x\_2 + x\_3 + \tau x\_2 + \tau x\_3, \ x\_2 + x\_3 + \tau x\_3, \ x\_3, \ x\_3 \rangle \\ \mathbf{(q)}^{\bullet(2)} &=& \langle x\_1 + \tau x\_2 + \tau x\_3 + \tau^2 x\_3, \ x\_2 + \tau x\_3, \ x\_3, \ x\_3 \rangle \\ \mathbf{(q)}^{\bullet(2)}\bullet\alpha(\mathbf{p}) &=& \langle x\_1 + x\_2 + x\_3 + \tau x\_2 + \tau x\_3 + \tau^2 x\_3, \ x\_2 + x\_3 + \tau x\_3, \ x\_3, \ x\_3 \rangle \\ \end{array}$$

The last element is idempotent but applying generalization does not generate anything new. Thus the algorithm ends. The reader may reconsider the source code to verify that we have indeed obtained tight bounds for the loop.

### **6 Correctness**

We claim that our algorithm obtains a description of the worst-case results of the program that is precise up to constant factors. That is, we claim that the set of MPs returned provides an upper bound (on all executions) which is also tight; tightness means that every MP returned is also a lower bound (up to a constant factor) on an infinite sequence of possible executions. Unfortunately, due to space constraints, we are not able to give full details of the proofs here; however, we give the main highlights. Intuitively, what we want to prove is that the multipolynomials we compute cover all "behaviors" of the loop. More precisely, in the upper-bound part of the proof we want to cover all behaviors: upper-bounding is a universal statement. To prove that bounds are tight, we show that each such bound constitutes a *lower bound* on a certain "worst-case behavior": tightness is an existential statement. The main aspects of these proofs are as follows:


#### **7 Related Work**

Bound analysis, in the sense of finding symbolic bounds for data values, iteration bounds and related quantities, is a classic field of program analysis [18,24,27]. It is also an area of active research, with tools being currently (or recently) developed including COSTA [1], AProVE [13], CiaoPP [19], C4B [11], Loopus [26]—all for imperative programs. There is also work on functional and logic programs, term rewriting systems, recurrence relations, etc. which we cannot attempt to survey here. In the rest of this section we survey work which is more directly related to ours, and has even inspired it.

The LOOP language is due to Meyer and Ritchie [20], who note that it computes only primitive recursive functions, but complexity can rise very fast, even for programs with nesting-depth 2. Subsequent work [15–17,22] concerning similar languages attempted to analyze such programs more precisely; most of them proposed syntactic criteria, or analysis algorithms, that are sufficient for ensuring that the program lies in a desired class (often, polynomial-time programs), but are not both necessary and sufficient: thus, they do not prove decidability (the exception is [17] which has a decidability result for a weak "core" language). The core language we use in this paper is from Ben-Amram et al. [7], who observed that by introducing weak bounded loops instead of concrete loop commands and non-deterministic branching instead of "if", we have weakened the semantics just enough to obtain decidability of polynomial growth-rate. Justifying the necessity of these relaxations, [8] showed undecidability for a language that can only do addition and definite loops (that cannot exit early).

In the vast literature on bound analysis in various forms, there are a few other works that give a complete solution for a weak language. *Size-change programs* are considered by [12,28]. Size-change programs abstract away nearly everything in the program, leaving a control-flow graph annotated with assertions about variables which decrease (or do not increase) in a transition. Thus, it does not assume structured and explicit loops, and it cannot express information about values which increase. Both works yield tight bounds on the number of transitions until termination.

Dealing with a somewhat different problem, [14,21] both check, or find, *invariants* in the form of polynomial equations. We find it remarkable that they give complete solutions for weak languages, where the weakness lies in the non-deterministic control-flow, as in our language. If one could give a complete solution for polynomial *inequalities*, this would have implied a solution to our problem as well.

#### **8 Conclusion and Further Work**

We have solved an open problem in the area of analyzing programs in a simple language with bounded loops. For our language, it has been previously shown that it is possible to decide whether a variable's value, number of steps in the program, etc. are polynomially bounded or not. Now, we have an algorithm that computes tight polynomial bounds on the final values of variables in terms of initial values. The bounds are tight up to constant factors (suitable constants are also computable). This result improves our understanding of what is computable by, and about, programs of this form. An interesting corollary of our algorithm is that as long as variables are *polynomially bounded*, their worst-case bounds are described tightly by (multivariate) *polynomials*. This is, of course, not true for common Turing-complete languages. Another interesting corollary of the *proofs* is the definition of a simple class of patterns that suffice to realize the worst-case behaviors. This will appear in a planned extended version of this paper.

There are a number of possible directions for further work. We would like to look for decidability results for richer (yet, obviously, sub-recursive) languages. Some possible language extensions include deterministic loops, variable resets (cf. [4]), explicit constants, and procedures. The inclusion of explicit constants is a particularly challenging open problem.

Rather than extending the language, we could extend the range of bounds that we can compute. In light of the results in [17], it seems plausible that the approach can be extended to classify the Grzegorczyk-degree of the growth rate of variables when they are super-polynomial. There may also be room for progress regarding precise bounds of the form 2poly.

In terms of time complexity, our algorithm is polynomial in the size of the program times nnd, where d is the highest degree of any MP computed. Such exponential behavior is to be expected, since a program can be easily written to compute a multivariate polynomial that is exponentially long to write. But there is still room for finer investigation of this issue.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Complete Normal-Form Bisimilarity for State**

Dariusz Biernacki<sup>1</sup>, Sergue¨ı Lenglet2(B) , and Piotr Polesiuk<sup>1</sup>

> <sup>1</sup> University of Wroclaw, Wroclaw, Poland *{*dabi,ppolesiuk*}*@cs.uni.wroc.pl <sup>2</sup> Universit´e de Lorraine, Nancy, France serguei.lenglet@univ-lorraine.fr

**Abstract.** We present a sound and complete bisimilarity for an untyped λ-calculus with higher-order local references. Our relation compares values by applying them to a fresh variable, like normal-form bisimilarity, and it uses environments to account for the evolving store. We achieve completeness by a careful treatment of evaluation contexts comprising open stuck terms. This work improves over Støvring and Lassen's incomplete environment-based normal-form bisimilarity for the λρ-calculus, and confirms, in relatively elementary terms, Jaber and Tabareau's result, that the state construct is discriminative enough to be characterized with a bisimilarity without any quantification over testing arguments.

#### **1 Introduction**

Two terms are contextually equivalent if replacing one by the other in a bigger program does not change the behavior of the program. The quantification over program contexts makes contextual equivalence hard to use in practice and it is therefore common to look for more effective characterizations of this relation. In a calculus with local state, such a characterization has been achieved either through *logical relations* [1,5,15], which rely on types, denotational models [6,10,13], or coinductively defined *bisimilarities* [9,12,17–19].

Koutavas et al. [8] argue that to be sound w.r.t. contextual equivalence, a bisimilarity for state should accumulate the tested terms in an environment to be able to try them again as the store evolves. Such *environmental bisimilarities* usually compare terms by applying them to arguments built from the environment [12,17,19], and therefore still rely on some universal quantification over testing arguments. An exception is Støvring and Lassen's bisimilarity [18], which compares terms by applying them to a fresh variable, like one would do with a *normal-form* (or *open*) bisimilarity [11,16]. Their bisimilarity characterizes contextual equivalence in a calculus with control and state, but is not *complete* in a calculus with state only: there exist equivalent terms that are not related by the bisimilarity. Jaber and Tabareau [6] go further and propose a sound and complete *Kripke Open Bisimilarity* for a calculus with local state, which also compares terms by applying them to a fresh variable, but uses notions from Kripke logical relations, namely transition systems of invariants, to reason about heaps.

c The Author(s) 2019

In this paper, we propose a sound and complete normal-form bisimilarity for a call-by-value λ-calculus with local references which relies on environments to handle heaps. We therefore improve over Støvring and Lassen's work, since our relation is complete, by following a different, potentially simpler, path than Jaber and Tabareau, since we use environments to represent possible worlds and do not rely on any external structures such as transition systems of invariants. Moreover, we do not need types and define our relation in an untyped calculus.

We obtain completeness by treating carefully normal forms that are not values, i.e., open stuck terms of the form E[x v]. First, we distinguish in the environment the terms which should be tested multiple times from the ones that should be run only once, namely the evaluation contexts like E in the above term. The latter are kept in a separate environment that takes the form of a stack, according to the idea presented by Laird [10] and by Jagadeesan et al. [7]. Second, we relate the so-called *deferred diverging* terms [5,6], i.e., open stuck terms which hide a diverging behavior in the evaluation context E, with the regular diverging terms.

It may be worth stressing that our congruence proof is based on the machinery we have developed before [3] and is simpler than Støvring and Lassen's one, in particular in how it accounts for the extensionality of functions.

We believe that this work makes a contribution to the understanding of how one should adjust the normal-form bisimulation proof principle when the calculus under consideration becomes less discriminative, assuming that one wishes to preserve completeness of the theory. In particular, it is quite straightforward to define a complete normal-form bisimilarity for the λ-calculus with first-class continuations and global store, with no need to refer to other notions than the ones already present in the reduction semantics. Similarly, in the λμρ-calculus (continuations and local references), one only needs to introduce environments to ensure soundness of the theory, but essentially nothing more is required to obtain completeness [18]. In this article we show which new ingredients are needed when moving from these two highly expressive calculi to the corresponding, less discriminative ones—with global or local references only—that do not offer access to the current continuation.

The rest of this paper is as follows. In Sect. 2, we study a simple calculus with global store to see how to reach completeness in that case. In particular, we show in Sect. 2.2 how we deal with deferred diverging terms. We remind in Sect. 2.3 the notion of *diacritical progress* [3] and the framework our bisimilarity and its proof of soundness are based upon. We sketch the completeness proof in Sect. 2.4. Section 2 paves the way for the main result of the paper, described in Sect. 3, where we turn to the calculus with local store. We define the bisimilarity in Sect. 3.2, prove its soundness and completeness in Sect. 3.3, and use it in Sect. 3.4 on examples taken from the literature. We conclude in Sect. 4, where we discuss related work and in particular compare our work to Jaber and Tabareau's. A companion report expands on the proofs [4].

# **2 Global Store**

We first consider a calculus where terms share a global store and present how we deal with deferred diverging terms to get a complete bisimilarity.

#### **2.1 Syntax, Semantics, and Contextual Equivalence**

We extend the call-by-value λ-calculus with the ability to read and write a global memory. We let x, y, . . . range over term variables and l range over references. A *store*, denoted by h, g, is a finite map from references to values; we write dom(h) for the domain of h, i.e., the set of references on which h is defined. We write ∅ for the empty store, h g for the union of two stores, assuming dom(h)∩dom(g) = ∅. The syntax of terms and contexts is defined as follows.


The term l := t; s evaluates t (if possible) and stores the resulting value in l before continuing as s, while !l reads the value kept in l. When writing examples and in the completeness proofs, we use natural numbers, booleans, the conditional if ... then ... else ..., local definitions let... in ..., sequence ;, and unit () assuming the usual call-by-value encodings for these constructs.

A λ-abstraction λx.t binds x in t; we write fv(t) (respectively fv(E)) for the set of free variables of t (respectively E). We identify terms up to α-conversion of their bound variables. A variable or reference is *fresh* if it does not occur in any other entities under consideration, and a store is fresh if it maps references to pairwise distinct fresh variables. A term or context is *closed* if it has no free variables. We write fr(t) for the set of references that occur in t.

The call-by-value semantics of the calculus is defined on *configurations* h | t such that fr(t) ⊆ dom(h) and for all l ∈ dom(h), fr(h(l)) ⊆ dom(h). We let c and d range over configurations. We write t{v/x} for the usual capture-avoiding substitution of x by v in t, and we let ∫ range over simultaneous substitutions .{v1/x1} ... {vn/x<sup>n</sup>}. We write h[l := v] for the operation updating the value of l to v. The reduction semantics → is defined by the following rules.

$$\{h \mid (\lambda x.t) \, v\} \to \langle h \mid t \{v/x\} \rangle \qquad \langle h \mid !l \rangle \to \langle h \mid h(l) \rangle$$

$$\langle h \mid l := v; t \rangle \to \langle h \vert l := v \rangle \mid t \rangle \quad \langle h \mid E[t] \rangle \to \langle g \mid E[s] \rangle \text{ if } \langle h \mid t \rangle \to \langle g \mid s \rangle$$

The well-formedness condition on configurations ensures that a read operation !l cannot fail. We write →<sup>∗</sup> for the reflexive and transitive closure of →.

A term t of a configuration h | t which cannot reduce further is called a *normal form*. Normal forms are either values or *open-stuck terms* of the form E[x v]; closed normal forms can only be λ-abstractions. A configuration *terminates*, written c ⇓ if it reduces to a normal-form configuration; otherwise it *diverges*, written <sup>c</sup> ⇑, like configurations running <sup>Ω</sup> def = (λx.x x) (λx.x x).

*Contextual equivalence* equates terms behaving the same in all contexts. A substitution ∫ closes a term t if t∫ is closed; it closes a configuration h | t if it closes t and the values in h.

**Definition 1.** t *and* s *are contextually equivalent, written* t ≡ s*, if for all contexts* E*, fresh stores* h*, and closing substitutions* ∫ *,* h | E[t]∫ ⇓ *iff* h | E[s]∫ ⇓*.*

Testing only evaluation contexts is not a restriction, as it implies the equivalence w.r.t. all contexts ≡<sup>C</sup> : one can show that t ≡<sup>C</sup> s iff λx.t ≡<sup>C</sup> λx.s iff λx.t ≡ λx.s.

#### **2.2 Normal-Form Bisimulation**

*Informal Presentation.* Two open terms are normal-form bisimilar if their normal forms can be decomposed into bisimilar subterms. For example in the plain λ-calculus, a stuck term E[x v] is bisimilar to t if t reduces to a stuck term F[xw] so that respectively E, F and v, w are bisimilar when they are respectively plugged with and applied to a fresh variable.

Such a requirement is too discriminating for many languages, as it distinguishes terms that should be equivalent. For instance in plain λ-calculus, given a closed value v, t def = x v is not normal form bisimilar to s def = (λy.x v) (x v). Indeed, is not bisimilar to (λy.x v) when plugged with a fresh z: the former produces a value z while the latter reduces to a stuck term x v. However, t and s are contextually equivalent, as for all closed value w, t{w/x} and s{w/x} behave like w v: if w v diverges, then they both diverges, and if w v evaluates to some value w , then they also evaluates to w . Similarly, xvΩ and Ω are not normal-form bisimilar (one is a stuck term while the other is diverging), but they are contextually equivalent by the same reasoning.

The terms t and s are no longer contextually equivalent in a λ-calculus with store, since a function can count how many times it is applied and change its behavior accordingly. More precisely, t and s are distinguished by the context l := 0; (λx.-) λz.l :=!l + 1; if !l = 1 then 0 else Ω. But this counting trick is not enough to discriminate xvΩ and Ω, as they are still equivalent in a λ-calculus with store. Although xvΩ is a normal form, it is in fact always diverging when we replace x by an arbitrary closed value w, either because w v itself diverges, or it evaluates to some w and then w Ω diverges. A stuck term which hides a diverging behavior has been called *deferred diverging* in the literature [5,6].

It turns out that being able to relate a diverging term to a deferred diverging term is all we need to change from the plain λ-calculus normal-form bisimilarity to get a complete equivalence when we add global store. We do so by distinguishing two cases in the clause for open-stuck terms: a configuration h | E[x v] is related to c either if c can reduce to a stuck configuration with related subterms, or if E is a diverging context, and we do not require anything of c. The resulting simulation is not symmetric as it relates a deferred diverging configuration with any configuration c (even converging one), but the corresponding notion of bisimulation equates such configuration only to either a configuration of the same kind or a diverging configuration such as h | Ω.

*Progress.* We define simulation using the notion of *diacritical progress* we developed in a previous work [2,3], which distinguishes between *active* and *passive* clauses. Roughly, passive clauses are between simulation states which should be considered equal, while active clauses are between states where actual progress is taking place. This distinction does not change the notions of bisimulation or bisimilarity, but it simplifies the soundness proof of the bisimilarity. It also allows for the definition of powerful *up-to techniques*, relations that are easier to use than bisimulations but still imply bisimilarity. For normal-form bisimilarity, our framework enables up-to techniques which respects η-expansion [3].

Progress is defined between objects called *candidate relations*, denoted by R, S, T . A candidate relation R contains pairs of configurations, and a set of configurations written R↑, which we expect to be composed of diverging or deferred diverging configurations (for such relations we take <sup>R</sup><sup>−</sup><sup>1</sup><sup>↑</sup> to be R↑). We extend R to stores, terms, values, and contexts with the following definitions.

$$\begin{array}{ccccc}\mathsf{dom}(h)=\mathsf{dom}(g)&\forall l,h(l)\ \mathcal{R}^{\mathsf{V}}\ g(l)&\underline{\qquad}&\langle h\mid t\rangle\ \mathcal{R}\ \langle h\mid s\rangle&h\text{ fresh}\\\hline\end{array}$$

v x <sup>R</sup><sup>t</sup> wx x fresh <sup>v</sup> <sup>R</sup><sup>v</sup> <sup>w</sup> <sup>E</sup>[x] <sup>R</sup><sup>t</sup> <sup>F</sup>[x] <sup>x</sup> fresh <sup>E</sup> <sup>R</sup><sup>c</sup> <sup>F</sup> h | E[x] ∈ R↑ x, h fresh <sup>E</sup> ∈ R↑<sup>c</sup>

We use these extensions to define progress as follows.

**Definition 2.** *A candidate relation* R *progresses to* S*,* T *written* R S, T *, if* R⊆S*,* S⊆T *, and*

$$\begin{array}{l} 1. \ c \ \mathcal{R} \ d \ implies \\ \quad - \ if \ c \ \to \ c', \ then \ d \ \to^\* d' \ \,\ and \ c' \ \mathcal{T} \ d';\\ \quad - \ if \ c = \langle h \ \mid v \rangle, \ then \ d \ \to^\* \langle g \ \mid w \rangle, \ h \ \mathcal{S}^{\mathsf{h}} \ g, \ and \ v \ \mathcal{S}^{\mathsf{v}} \ w;\\ \quad - \ if \ c = \langle h \ \mid E[x \, v] \rangle, \ then \ either \ either \end{array} \\ \begin{array}{l} 1. \ c \ \to^\* \langle g \ \mid F[x \, w] \rangle, \ then \ d \ \mathcal{T}^{\mathsf{h}} \ g, \ E \ \mathcal{T}^{\mathsf{c}} \ F, \ and \ v \ \mathcal{T}^{\mathsf{v}} \ w, \ or \ \mathsf{h} \\ \qquad \bullet \ d \ \to^\* \langle g \ \mid F[x \, w] \rangle, \ h \ \mathcal{T}^{\mathsf{h}} \ g, \ E \ \mathcal{T}^{\mathsf{c}} \ F, \ and \ v \ \mathcal{T}^{\mathsf{v}} \ w, \ or \ \mathsf{h} \\ \qquad \bullet \ E \in \mathcal{T} \uparrow^{\mathsf{c}}. \\ \ 2. \ c \ \in \mathcal{R} \uparrow \text{ implies } \ c \neq \langle h \ \mid v \rangle \ for \ all \ h \ and \ v \ and \\ \quad - \ if \ c \ \to \ c', \ then \ c' \ \in \mathcal{T} \uparrow; \\ \ - \ if \ c = \langle h \ \mid E[x \, v] \rangle, \ then \ E \in \mathcal{T} \uparrow^{\mathsf{c}}. \end{array}$$

*A normal-form simulation is a candidate relation* R *such that* R R, R*, and a bisimulation is a candidate relation* <sup>R</sup> *such that* <sup>R</sup> *and* <sup>R</sup><sup>−</sup><sup>1</sup> *are simulations. Normal-form bisimilarity* ≈ *is the union of all normal-form bisimulations.*

We test values and contexts by applying or plugging them with a fresh variable x, and running them in a fresh store; with a global memory, the value represented by x may access any reference and assign it an arbitrary value, hence the need for a fresh store. The stores of two bisimilar value configurations must have the same domain, as it would be easy to distinguish them otherwise by testing the content of the references that would be in one store but not in the other.

The main novelty compared to usual definitions of normal-form bisimilarity [3,11] is the set of (deferred) diverging configurations used in the stuck terms clause. We detect that E in a configuration h | E[xv] is (deferred) diverging by running h | E[y] where y and h are fresh; this configuration may then diverge or evaluate to an other deferred diverging configuration h | E [x v].

Like in the plain λ-calculus [3], R progresses towards S in the value clause and T in the others; the former is passive while the others are active. Our framework prevents some up-to techniques from being applied after a passive transition. In particular, we want to forbid the application of bisimulation up to context as it would be unsound: we could deduce that v x and w x are equivalent for all v and w just by building a candidate relation containing v and w.

*Example 1.* To prove that h | xvΩ≈h | Ω holds for all v and h, we prove that <sup>R</sup> def ={(h | xvΩ,h | Ω), {g | y Ω | y, g fresh}} is a bisimulation. Indeed, h | xvΩ is stuck with g | y Ω ∈ R↑ for fresh y and g, and we have g | y Ω→g | y Ω. Conversely, the transition h | Ω→h | Ω is matched by h | xvΩ →<sup>∗</sup> h | xvΩ and the resulting terms are in R.

#### **2.3 Soundness**

In this framework, proving that ≈ is sound is a consequence that a form of bisimulation up to context is valid, a result which itself may require to prove that other up-to techniques are valid. We distinguish the techniques which can be used in passive clauses (called *strong* up-to techniques), from the ones which cannot. An up-to technique (resp. strong up-to technique) is a function f such that R R, f(R) (resp. R f(R), f(R)) implies R⊆≈. To show that a given f is an up-to technique, we rely on a notion of *respectfulness*, which is simpler to prove and gives sufficient conditions for f to be an up-to technique.

We briefly recall the notions we need from our previous work [2]. We extend ⊆ and ∪ to functions argument-wise (e.g., (f ∪ g)(R) = f(R) ∪ g(R)), and given a set F of functions, we also write F for the function defined as - <sup>f</sup>∈<sup>F</sup> <sup>f</sup>. We define f <sup>ω</sup> as - <sup>n</sup>∈<sup>N</sup> <sup>f</sup> <sup>n</sup>. We write id for the identity function on relations, and <sup>f</sup> for f ∪id. A function f is monotone if R⊆S implies f(R) ⊆ f(S). We write P*fin*(R) for the set of finite subsets of R, and we say f is continuous if it can be defined by its image on these finite subsets, i.e., if f(R) ⊆ - S∈P*fin* (R) <sup>f</sup>(S). The up-to techniques we use are defined by inference rules with a finite number of premises, so they are trivially continuous.

**Definition 3.** *A function* f *evolves to* g, h*, written* f g, h*, if for all* R *and* T *,* R R, T *implies* f(R)g(R), h(T )*. A function* f strongly *evolves to* g, h*, written* f <sup>s</sup> g, h*, if for all* R*,* S*, and* T *,* R S, T *implies* f(R)g(S), h(T )*.*

Evolution can be seen as progress for functions on relations. Evolution is more restrictive than strong evolution, as it requires R such that R R, T .

**Definition 4.** *A set* F *of continuous functions is* respectful *if there exists* S *such that* S ⊆ F *and*



**Fig. 1.** Up-to techniques for the calculus with global store

In words, a function is in a respectful set F if it evolves towards a combination of functions in F after active clauses, and in S after passive ones. When checking that f is regular (second case), we can use a regular function at most once after a passive clause. The (possibly empty) subset S intuitively represents the strong up-to techniques of F. If S<sup>1</sup> and S<sup>2</sup> are subsets of F which verify the conditions of the definition, then S<sup>1</sup> ∪ S<sup>2</sup> also does, so there exists the largest subset of F which satisfies the conditions, written strong(F).

**Lemma 1.** *Let* F *be a respectful set.*


Showing that f is in a respectful set F is easier than proving it is an up-to technique. Besides, proving that a bisimulation up to context is respectful implies that ≈ is preserved by contexts thanks to the last property of Lemma 1.

The up-to techniques for the calculus with global store are given in Fig. 1. The techniques subst and plug allow to prove that ≈ is preserved by substitution and by evaluation contexts. The remaining ones are auxiliary techniques which are used in the respectfulness proof: red relies on the fact that the calculus is deterministic to relate terms up to reduction steps. The technique div allows to relate a diverging configuration to any other configuration, while plugdiv states that if E is a diverging context, then h | E[t] is a diverging configuration for all h and t. We distinguish the technique plug<sup>c</sup> from plug<sup>↑</sup> to get a more fine-grained classification, as plug<sup>c</sup> is the only one which is not strong.

**Lemma 2.** *The set* F def = {subst, plugm,red, div, plugdiv | m ∈ {c, ↑}} *is respectful, with* strong(F) = F \ {plugc}*.*

We omit the proof, as it is similar but much simpler than for the calculus with local store of Sect. 3. We deduce that ≈ is sound using Lemma 1.

**Theorem 1.** *For all* t*,* s*, and fresh store* h*, if* h | t≈h | s*, then* t ≡ s*.*

#### **2.4 Completeness**

We prove the reverse implication by building a bisimulation which contains ≡.

**Theorem 2.** *For all* t*,* s*, if* t ≡ s*, then for all fresh stores* h*,* h | t≈h | s*.*

*Proof (Sketch).* It suffices to show that the candidate R defined as

{(h | t,g | s) | ∀E,hE, closing ∫ ,h h<sup>E</sup> | E[t]∫ ⇓ ⇒ g h<sup>E</sup> | E[s]∫ ⇓} ∪ {h | t|∀E,hE, closing ∫ ,h h<sup>E</sup> | E[t]∫ ⇑}

is a simulation. We proceed by case analysis on the behavior of h | t. The details are in the report [4]; we sketch the proof in the case when h | tRg | s, t = E[x v], and E is not deferred diverging.

A first step is to show that g | s also evaluates to an open-stuck configuration with x in function position. To do so, we consider a fresh l and we define ∫ such that ∫ (y) sets l at 1 when it is first applied if y = x, and at 2 if y = x. Then h l := 0 | t∫ sets l at 1, which should also be the case of g l := 0 | s∫ , and it is possible only if g | s →<sup>∗</sup> g | F[x w] for some g , F, and w.

We then have to show that <sup>E</sup> <sup>R</sup><sup>c</sup> <sup>F</sup>, <sup>v</sup> <sup>R</sup><sup>v</sup> <sup>w</sup>, and <sup>h</sup> <sup>R</sup><sup>h</sup> <sup>g</sup> . We sketch the proof for the contexts, as the proofs for the values and the stores are similar. Given h<sup>f</sup> a fresh store, y a fresh variable, E a context, h<sup>E</sup> a store, ∫ a closing substitution, we want h<sup>f</sup> h<sup>E</sup>- | E [E[y]]∫ ⇓ iff h<sup>f</sup> h<sup>E</sup>- | E [F[y]]∫ ⇓.

Let l be a fresh reference. Assuming dom(h) = {l<sup>1</sup> ...l<sup>n</sup>}, given a term t, we write - <sup>i</sup> l<sup>i</sup> := h;t for l<sup>1</sup> := h(l1); ...l<sup>n</sup> := h(ln);t. We define

$$\int\_{x} f\_x \stackrel{\text{def}}{=} \begin{cases} x \mapsto \lambda a. \text{if } !l = 0 \text{ then } !l := 1; \bigcup\_i l\_i := h\_f \uplus h\_{E'}; \int (y) \text{ else } \int (x) \, a. \\\\ z \mapsto \int' (z) \qquad \text{if } z \neq x \end{cases}$$

The substitution ∫<sup>x</sup> behaves like ∫ except that when ∫<sup>x</sup>(x) is applied for the first time, it replaces its argument by ∫ (y) and sets the store to h<sup>f</sup> h<sup>E</sup>- . Therefore h l := 0 | E [t]∫<sup>x</sup> →<sup>∗</sup> h<sup>f</sup> h<sup>E</sup> l := 1 | E [E[y]]∫<sup>x</sup>, but this configuration then behaves like h<sup>f</sup> h<sup>E</sup>- | E [E[y]]∫ . Similarly, g l := 0 | E [s]∫<sup>x</sup> evaluates to a configuration equivalent to h<sup>f</sup> h<sup>E</sup>- | E [F[y]]∫ , and since h l := 0 | E [t]∫<sup>x</sup> ⇓ implies g l := 0 | E [s]∫<sup>x</sup> ⇓, we can conclude from there.

#### **3 Local Store**

We adapt the ideas of the previous section to a calculus where terms create their own local store. To be able to deal with local resources, the relation we define mixes principles from normal-form and environmental bisimilarities.

#### **3.1 Syntax, Semantics, and Contextual Equivalence**

In this section, the terms no longer share a global store, but instead must create local references before storing values. We extend the syntax of Sect. 2 with a construct to create a new reference.

```
Terms: t, s ::= ... | new l := v in t
```
Reference creation new l := v in t binds l in t; we identify terms up to αconversion of their references. We write fr(t) and fr(E) for the set of free references of t or E, and a term or context is *reference-closed* if its set of free references is empty. Following [18] and in contrast with [5,6], references are not values, but we can still give access to a reference l by passing λx.!l and λx.l := x; λy.y.

As before, the semantics is defined on configurations h | t verifying fr(t) ⊆ dom(h) and for all l ∈ dom(h), fr(h(l)) ⊆ dom(h). We add to the rules of Sect. 2 the following one for reference creation.

$$\langle h \mid \mathsf{new} \; l := v \; \mathsf{in} \; t \rangle \to \langle h \uplus l := v \mid t \rangle$$

We remind that is defined for disjoint stores only, so the above rule assumes that l /∈ dom(h), which is always possible using α-conversion.

We define contextual equivalence on reference-closed terms as we expect programs to allocate their own store.

**Definition 5.** *Two reference-closed terms* t *and* s *are contextually equivalent, written* t ≡ s*, if for all reference-closed evaluation contexts* E *and closing substitutions* ∫ *,* ∅ | E[t]∫ ⇓ *iff* ∅ | E[s]∫ ⇓*.*

#### **3.2 Bisimilarity**

With local stores, an external observer no longer has direct access to the stored values. In presence of such information hiding, a sound bisimilarity relies on an *environment* to accumulate terms which should be tested in different stores [8].

*Example 2.* Let f<sup>1</sup> def = λx.if !l = true then l := false;true else false and f<sup>2</sup> def = λx.true. If we compare new l := true in f<sup>1</sup> and f<sup>2</sup> only once in the empty store, they would be seen as equivalent as they both return true, however f<sup>1</sup> modify its store, so running f<sup>1</sup> and f<sup>2</sup> a second time distinguishes them.

Environments generally contain only values [17], except in λμρ [18], where plugged evaluation contexts are kept in the environment when comparing openstuck configurations. In contrast with λμρ, our environment collects values, and we use a *stack* for registering contexts [7,10]. Unlike values, contexts are therefore tested only once, following a last-in first-out ordering. The next example shows that considering contexts repeatedly would lead to an overly-discriminating bisimilarity. For the stack discipline of testing contexts in action see Example 8 in Sect. 3.4.

*Example 3.* With the same f<sup>1</sup> and f<sup>2</sup> as in Example 2, the terms t def = new l := true in f<sup>1</sup> (x λy.y) and s def = f<sup>2</sup> (x λy.y) are contextually equivalent. Roughly, for all closing substitution ∫ , t and s either both diverge (if ∫ (x) λy.y diverges), or evaluate to true, since ∫ (x) cannot modify the value in l. Testing f<sup>1</sup> and f<sup>2</sup> - twice would discriminate them and wrongfully distinguish t and s.

*Remark 1.* The bisimilarity for λμρ runs evaluation contexts several times and is still complete because of the μ operator, which, like call/cc, captures evaluation contexts, and may then execute them several times.

We let E range over sets of pairs of values, and over sets of values. Similarly, we write Σ for a stack of pairs of evaluation contexts and σ for a stack of evaluation contexts. We write for the empty stack, :: for the operator putting an element on top of a stack, and ++ for the concatenation of two stacks. The projection operator π<sup>1</sup> transforms a set or stack of pairs into respectively a set or stack of single elements by taking the first element of each pair. A candidate relation R can be composed of:


**Definition 6.** *A candidate relation* R *progresses to* S*,* T *written* R S, T *, if* R⊆S*,* S⊆T *, and*

*1.* E, Σ c R d *implies – if* c → c *, then* d →<sup>∗</sup> d *and* E, Σ c T d *; – if* c = h | v*, then either* • d →<sup>∗</sup> g | w*, and* E∪{(v, w)}, Σ h S g*, or* • Σ = *and* π1(E) ∪ {v}, π1(Σ) h∈S↑*; – if* c = h | E[x v]*, then either* • d →<sup>∗</sup> g | F[x w]*, and* E∪{(v, w)},(E,F) :: Σ h S g*, or* • π1(E) ∪ {v}, E :: π1(Σ) h∈S↑*. 2.* E, Σ h R g *implies – if* v E w*, then* E, Σ h | v xSg | w x *for a fresh* x*; – if* Σ = (E,F) :: Σ *, then* E, Σ h | E[x]Sg | F[x] *for a fresh* x*. 3.* , σ c∈ R↑ *implies – if* c → c *, then* , σ c ∈T↑*; – if* c = h | v*, then* σ = *and* ∪ {v}, σ h∈S↑*; – if* c = h | E[x v]*, then* ∪ {v}, E :: σ h∈S↑*. 4.* , σ h∈ R↑ *implies that* σ = *and – if* v ∈ *, then* , σ h | v x∈S↑ *for a fresh* x*; – if* σ = E :: σ *, then* , σ h | E[x]∈S↑ *for a fresh* x*.*

*A normal-form simulation is a candidate relation* R *such that* R R, R*, and a bisimulation is a candidate relation* <sup>R</sup> *such that* <sup>R</sup> *and* <sup>R</sup>−<sup>1</sup> *are simulations. Normal-form bisimilarity* ≈ *is the union of all normal-form bisimulations.*

When E, Σ c R d, we reduce c until we get a value v or a stuck term E[xv]. At that point, either d also reduces to a normal form of the same kind, or we test (the first projection of) the stack Σ for divergence, assuming it is not empty. In the former case, we add the values to E and the evaluation contexts at the top of Σ, getting a judgment of the form E , Σ h R g, which then tests the environment and the stack by running either terms in E or at the top of Σ .

*Example 4.* We sketch the bisimulation proof for the terms t and s of Example 3. Because ∅ | t →<sup>∗</sup> l := true | f<sup>1</sup> (x λy.y) and ∅ | s = ∅ | f<sup>2</sup> (x λy.y), we need to define R such that {(λy.y, λy.y)},(f<sup>1</sup> -, f<sup>2</sup> -) :: l := true R ∅. Testing the equal values in the environment is easy with up-to techniques. For the contexts on the stack, we need {(λy.y, λy.y)}, l := true | f<sup>1</sup> z R ∅ | f<sup>2</sup> z for a fresh z. Since l := true | f<sup>1</sup> z →<sup>∗</sup> l := false | true and ∅ | f<sup>2</sup> z →<sup>∗</sup> ∅ | true, we need {(λy.y, λy.y),(true,true)}, l := false R ∅, which is simple to check.

*Example 5.* In contrast, we show that t def = new l := true in f<sup>1</sup> (x λy.l := y; y) and s def = f<sup>2</sup> (x λy.y) are not bisimilar. We would need to build R such that {(λy.l := y; y, λy.y)},(f<sup>1</sup> -, f<sup>2</sup> -) :: l := true R ∅. Testing the values in the environment, we want {(λy.l := y; y, λy.y),(z, z)},(f<sup>1</sup> -, f<sup>2</sup> -) :: l := z R ∅ for a fresh z. Executing the contexts on the stack, we get a stuck term of the form if z then l := false;true else false and a value true, which cannot be related, because the former is not deferred diverging.

The terms t and s are therefore not bisimilar, and they are indeed not contextually equivalent, since t gives access to its private reference by passing λy.l := y; y to x. The function represented by x can then change the value of l to false and break the equivalence.

The last two cases of the bisimulation definition aim at detecting a deferred diverging context. The judgment , σ h∈ R↑ roughly means that if σ = E<sup>n</sup> :: ...E<sup>1</sup> :: , then the configuration h | E1[...En[x]] diverges for all fresh x and all h obtained by running a term from E with the store h. As a result, when , σ h∈ R↑, we have two possibilities: either we run a term from E in h to potentially change h, or we run the context at the top of σ (which cannot be empty in that case) to check if it is diverging. In both cases, we get a judgment of the form , σ c∈ R↑. In that case, either c diverges and we are done, or it terminates, meaning that we have to look for divergence in σ .

*Example 6.* We prove that ∅ | xvΩ and ∅ | Ω are bisimilar. We define R such that ∅, ∅ | xvΩ R ∅ | Ω, for which we need {v}, - Ω :: ∅ ∈ R↑, which itself holds if {v}, ∅ | y Ω ∈ R↑.

Finally, only the two clauses where a reduction step takes place are active; all the others are passive, because they are simply switching from one judgment to


**Fig. 2.** Selected up-to techniques for the calculus with local store

the other without any real progress taking place. For example, when comparing value configurations, we go from a configuration judgment E, Σ c R d to a store judgment E, Σ h R g or a diverging store judgment E, Σ h∈ R↑. In a (diverging) store judgment, we simply decide whether we reduce a term from the store of from the stack, going back to a (diverging) configuration judgment. Actual progress is made only when we start reducing the chosen configuration.

#### **3.3 Soundness and Completeness**

We briefly discuss the up-to techniques we need to prove soundness. We write E{(v, w)/x} for the environment {(v {v/x}, w {w/x}) | v E w }, and we also define Σ{(x, w)/x}, {v/x}, and σ{v/x} as expected. To save space, Fig. 2 presents the up-to techniques for the configuration judgment only; see the report [4] for the other judgments.

As in Sect. 2.3, the techniques subst and plug allow to reason up to substitution and plugging into an evaluation context, except that the substituted values and plugged contexts must be taken from respectively the environment and the top of the stack. The technique div relates a diverging configuration to any configuration, like in the calculus with global store. The technique ccomp allows to merge successive contexts in the stack into one. The weakening technique weak, originally known as bisimulation up to environment [17], is an usual technique for environmental bisimulations. Making the environment smaller creates a weaker judgment, as having less testing terms means a less discriminating candidate relation. Bisimulation up to reduction red is also standard and allows for a bigstep reasoning by ignoring reduction steps. Finally, the technique refl allows to introduce identical contexts in the stack, but also values in the environment or terms in configurations (see the report [4]).

We denote by subst<sup>c</sup> the up to substitution technique restricted to the configuration and diverging configuration judgments, and by subst<sup>s</sup> the restriction to the store and diverging store judgments.

**Lemma 3.** *The set* F def = {substm, plug, ccomp, div,weak,red,refl | m ∈ {c,s}} *is respectful, with* strong(F) = {substs, ccomp, div,weak,red,refl}*.*

In contrast with Sect. 2.3 and our previous work [3], subst<sup>c</sup> is *not* strong, because values are taken from the environment. Indeed, with subst<sup>c</sup> strong, from {(v, w)}, ∅R∅, we could derive {(v, w)}, ∅ | x y refl(R) ∅ | x y and then {(v, w)}, ∅ | v x substc(refl(R)) ∅ | w x for any v and w, which would be unsound.

The respectfulness proofs are in the report [4]. Using refl, plug, substc, and Lemma 1 we prove that ≈ is preserved by evaluation contexts and substitution, from which we deduce it is sound w.r.t. contextual equivalence.

**Theorem 3.** *For all* t *and* s*, if* ∅, ∅ | t ≈ ∅ | s*, then* t ≡ s*.*

To establish completeness, we follow the proof of Theorem 2, i.e., we construct a candidate relation R that contains ≡ and prove it is a simulation by case analysis on the behavior of the related terms.

**Theorem 4.** *For all* t *and* s*, if* t ≡ s*, then* ∅, ∅ | t ≈ ∅ | s*.*

The main difference is that the contexts and closing substitutions are built from the environment using compatible closures [17], to take into account the private resources of the related terms. We discuss the proof in the report [4].

#### **3.4 Examples**

*Example 7.* We start by the so-called awkward example [5,6,15]. Let

$$v \stackrel{\text{def}}{=} \lambda f.l := 0; f \; (); l := 1; f \; (); !l \qquad w \stackrel{\text{def}}{=} \lambda f.f \; (); f \; (); 1.$$

We equate new l := 0 in v and w, building the candidate R incrementally, starting from {(v, w)}, l := 0 R ∅.

Running v and w with a fresh variable f, we obtain l := 0 | E1[f ()] and ∅ | E2[f ()] with E<sup>1</sup> def = -; l := 1; f (); !l and F<sup>1</sup> def = -; f (); 1. Ignoring the identical unit arguments (using refl), we need {(v, w)},(E1, F1) :: l := 0 R ∅; from that point, we can either test v and w again, resulting into an extra pair (E1, F1) on the stack, or run l := 0 | E1[g] and ∅ | F1[g] for a fresh g instead.

In the latter case, we get l := 1 | E2[g ()] and ∅ | F2[g ()], with E<sup>2</sup> def = -; !l and F<sup>2</sup> def = -; 1, so we want {(v, w)},(E2, F2) :: l := 1 R ∅ (ignoring again the units). From there, testing v and w produces {(v, w)},(E1, F1) :: (E2, F2) :: l := 0 R ∅, while executing l := 1 | E2[x] and ∅ | F2[x] for a fresh x gives us l := 1 | 1 and ∅ | 1. This analysis suggests that R should be composed only of judgments of the form {(v, w)}, Σ l := n R ∅ such that n ∈ {0, 1} and

– Σ is an arbitrary stack composed only of pairs (E1, F1) or (E2, F2); – if Σ = (E2, F2) :: Σ , then n = 1.

We can check that such a candidate is a bisimulation, and it ensures that when l is read (when E<sup>2</sup> is executed), it contains the value 1.

*Example 8.* As a variation on the awkward example, let

$$v \stackrel{\text{def}}{=} \lambda f.l :=!l+1; f \text{ ()}; l :=!l-1; l > 0 \qquad w \stackrel{\text{def}}{=} \lambda f.f \text{ ()}; \text{true.}$$

We show that ∅ | new <sup>l</sup> := 1 in <sup>v</sup> and ∅ | <sup>w</sup> are bisimilar. Let <sup>E</sup> def = -; <sup>l</sup> :=!<sup>l</sup> <sup>−</sup> 1; !l > 0 and <sup>F</sup> def = -;true. We write (E,F)<sup>n</sup> for the stack if <sup>n</sup> = 0 and (E,F) :: (E,F)<sup>n</sup>−<sup>1</sup> otherwise. Then the candidate <sup>R</sup> verifying {(v, w)},(E,F)<sup>n</sup> <sup>l</sup> := <sup>n</sup> + 1 R ∅ for any <sup>n</sup> is a bisimulation. Indeed, running v and w increases the value stored in l and adds a pair (E,F) on the stack. If n > 0, we can run a copy of E and F, thus decreasing the value in l by 1, and then returning true in both cases.

*Example 9.* This deferred divergence example comes from Dreyer et al. [5]. Let

$$\begin{aligned} v\_1 &\stackrel{\text{def}}{=} \lambda x. \text{if } !l \text{ then } \Omega \text{ else } k := \text{true}; \lambda y. y & \quad & w\_1 \stackrel{\text{def}}{=} \lambda x. \Omega\\ v\_2 &\stackrel{\text{def}}{=} \lambda f. f \; v\_1; \text{if } !k \text{ then } \Omega \text{ else } l := \text{true}; \lambda y. y & \quad & w\_2 \stackrel{\text{def}}{=} \lambda f. f \; w\_1; \lambda y. y \end{aligned}$$

We prove that new l := false in new k := false in v<sup>2</sup> is equivalent to w2. Informally, if f in w<sup>2</sup> applies its argument w1, the term diverges. Divergence also happens in v<sup>2</sup> but in a delayed fashion, as v<sup>1</sup> first sets k to true, and the continuation t def = if !k then Ω else l := true; λy.y then diverges. Similarly, if f stores w<sup>1</sup> or v<sup>1</sup> to later apply it, then divergence also occurs in both cases: in that case t sets l to true, and when v<sup>1</sup> is later applied, it diverges.

To build a candidate R, we execute l := false; k := false | v<sup>2</sup> f and ∅ | w<sup>2</sup> f for a fresh f, which gives us l := false; k := false | E[f v1] and ∅ | F[f w1] with E def = -;t and F def = -; λy.y. We consider {(v2, w2),(v1, w1)},(E,F) :: ∅ l := false; k := false R ∅, for which we have several checks to do. The interesting one is running l := false; k := false | v<sup>1</sup> x and ∅ | w<sup>1</sup> x, as we get l := false; k := true | λy.y and ∅ | Ω. In that case, we are showing that the stack contains divergence, by establishing that {v2, v1, λy.y}, E :: ∅ l := false; k := true∈ R↑, and indeed, we have l := false; k := true | E[x] →<sup>∗</sup> l := false; k := true | Ω for a fresh x. In the end, the relation R verifying

$$\begin{aligned} \{(v\_2, w\_2), (v\_1, w\_1)\}, (E, F)^n &\vdash l := \mathtt{false}; k := \mathtt{false} \ \mathcal{R} \ \emptyset \\ \{(v\_2, w\_2), (v\_1, w\_1)\}, (E, F)^n &\vdash \langle l := \mathtt{false}; k := \mathtt{true} \ \mid \lambda y. y\rangle \ \mathcal{R} \ \langle \emptyset \mid \mathcal{Q}\rangle \\ \{v\_2, v\_1, \lambda y. y\}, E^n &\vdash l := \mathtt{false}; k := \mathtt{true} \in \mathcal{R} \ \uparrow \\ \{v\_2, v\_1, \lambda y. y\}, E^n &\vdash \langle l := \mathtt{false}; k := \mathtt{true} \mid \mathcal{Q}\rangle \in \mathcal{R} \ \uparrow \\ \{(v\_2, w\_2), (v\_1, w\_1)\}, (E, F)^n &\vdash l := \mathtt{true}; k := \mathtt{false} \ \mathcal{R} \ \emptyset \\ \{(v\_2, w\_2), (v\_1, w\_1)\}, (E, F)^n &\vdash \langle l := \mathtt{true}; k := \mathtt{false} \ \mid \mathcal{Q}\rangle \ \mathcal{R} \ \langle \emptyset \mid \mathcal{Q}\rangle \end{aligned}$$

for all n is a bisimulation up to refl and red.

#### **4 Related Work and Conclusion**

*Related Work.* As pointed out in Sect. 1, the other bisimilarities defined for state either feature universal quantification over testing arguments [9,12,17,19], or are complete only for a more expressive language [18]. Kripke logical relations [1,5] also involve quantification over arguments when testing terms of a functional type. Finally, denotational models [10,13] can also be used to prove program equivalence, by showing that the denotations of two terms are equal. However, computing such denotations is difficult in general, and the automation of this task is so far restricted to a language with first-order references [14].

The work most closely related to ours is Jaber and Tabareau's Kripke Open Bisimulation (KOB) [6]. A KOB tests functional terms with fresh variables and not with related values like a regular logical relation would do. To relate two given configurations, one has to provide a World Transition System (WTS) which states the invariants the heaps of the configurations should satisfy and how to go from one invariant to the other during the evaluation. Similarly, the bisimulations for the examples of Sect. 3.4 state properties which could be seen as invariants about the stores at different points of the evaluation.

The difficulty for KOB as well as with our bisimilarity is to come up with the right invariants about the heaps, expressed either as a WTS or as a bisimulation. We believe that choosing a technique over the other is just a matter of preference, depending on whether one is more comfortable with game semantics or with coinduction. It would be interesting to see if there is a formal correspondence between KOB and our bisimilarity; we leave this question as a future work.

*Conclusion.* We define a sound and complete normal-form bisimilarity for higherorder local state, with an environment to be able to run terms in different stores. We distinguish in the environment values which should be tested several times from the contexts which should be executed only once. The other difficulty is to relate deferred and regular diverging terms, which is taken care of by the specific judgments about divergence. The lack of quantification over arguments make the bisimulation proofs quite simple.

A future work would be to make these proofs even simpler by defining appropriate up-to techniques. The techniques we use in Sect. 3.3 to prove soundness turn out to be not that useful when establishing the equivalences of Sect. 3.4, except for trivial ones such as up to reduction or reflexivity. The difficulty in defining the candidate relations for the examples of Sect. 3.4 is in finding the right property relating the stack Σ to the store, so maybe an up-to technique could make this task easier.

As pointed out in Sect. 1, our results can be seen as an indication of what kind of additional infrastructure in a complete normal-form bisimilarity is required when the considered syntactic theory becomes less discriminative—in our case, when control operators vanish from the picture, and mutable state is the only extension of the λ-calculus. A question one could then ask is whether we can find a less expressive calculus—maybe the plain λ-calculus itself—for which a suitably enhanced normal-form bisimilarity is still complete.

**Acknowledgements.** We thank Guilhem Jaber and the anonymous reviewers for their comments. This work was supported by the National Science Centre, Poland, grant no. 2014/15/B/ST6/00619 and by COST Action EUTypes CA15123.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Identifiers in Registers Describing Network Algorithms with Logic**

Benedikt Bollig, Patricia Bouyer, and Fabian Reiter(B)

LSV, CNRS, ENS Paris-Saclay, Universit´e Paris-Saclay, Cachan, France {bollig,bouyer}@lsv.fr, fabian.reiter@gmail.com

**Abstract.** We propose a formal model of distributed computing based on register automata that captures a broad class of synchronous network algorithms. The local memory of each process is represented by a finitestate controller and a fixed number of registers, each of which can store the unique identifier of some process in the network. To underline the naturalness of our model, we show that it has the same expressive power as a certain extension of first-order logic on graphs whose nodes are equipped with a total order. Said extension lets us define new functions on the set of nodes by means of a so-called partial fixpoint operator. In spirit, our result bears close resemblance to a classical theorem of descriptive complexity theory that characterizes the complexity class pspace in terms of partial fixpoint logic (a proper superclass of the logic we consider here).

# **1 Introduction**

This paper is part of an ongoing research project aiming to develop a *descriptive complexity* theory for *distributed computing*.

In classical sequential computing, descriptive complexity is a well-established field that connects computational complexity classes to equi-expressive classes of logical formulas. It began in the 1970s, when Fagin showed in [6] that the graph properties decidable by nondeterministic Turing machines in polynomial time are exactly those definable in existential second-order logic. This provided a logical—and thus machine-independent—characterization of the complexity class np. Subsequently, many other popular classes, such as p, pspace, and exptime were characterized in a similar manner (see for instance the textbooks [8,12,15]).

Of particular interest to us is a result due to Abiteboul, Vianu [1], and Vardi [19], which states that on structures equipped with a total order relation, the properties decidable in pspace coincide with those definable in *partial fixpoint logic*. The latter is an extension of first-order logic with an operator that allows us to inductively define new relations of arbitrary arity. Basically, this means that new relations can occur as free (second-order) variables in the logical formulas that define them. Those variables are initially interpreted as empty relations and then iteratively updated, using the defining formulas as update rules. If the sequence of updates converges to a fixpoint, then the ultimate interpretations are the relations reached in the limit. Otherwise, the variables are simply interpreted as empty relations. Hence the term "partial fixpoint".

While well-developed in the classical case, descriptive complexity has so far not received much attention in the setting of distributed network computing. As far as the authors are aware, the first step in this direction was taken by Hella et al. in [10,11], where they showed that basic *modal logic* evaluated on finite graphs has the same expressive power as a particular class of *distributed automata* operating in constant time. Those automata constitute a weak model of distributed computing in arbitrary network topologies, where all nodes synchronously execute the same finite-state machine and communicate with each other by broadcasting messages to their neighbors. Motivated by this result, several variants of distributed automata were investigated by Kuusisto and Reiter in [14,18] and [17] to establish similar connections with standard logics such as the *modal* μ*-calculus* and *monadic second-order logic*. However, since the models of computation investigated in those works are based on anonymous finite-state machines, they are much too weak to solve many of the problems typically considered in distributed computing, such as leader election or constructing a spanning tree. It would thus be desirable to also characterize stronger models.

A common assumption underlying many distributed algorithms is that each node of the considered network is given a unique identifier. This allows us, for instance, to elect a leader by making the nodes broadcast their identifiers and then choose the one with the smallest identifier as the leader. To formalize such algorithms, we need to go beyond finite-state machines because the number of bits required to encode a unique identifier grows logarithmically with the number of nodes in the network. Recently, in [2,3], Aiswarya, Bollig and Gastin introduced a synchronous model where, in addition to a finite-state controller, nodes also have a fixed number of registers in which they can store the identifiers of other nodes. Access to those registers is rather limited in the sense that their contents can be compared with respect to a total order, but their numeric values are unknown to the nodes. (This restriction corresponds precisely to the notion of *order-invariant* distributed algorithms, which was introduced by Naor and Stockmeyer in [16].) Similarly, register contents can be copied, but no new values can be generated. Since the original motivation for the model was to automatically verify certain distributed algorithms running on ring networks, its formal definition is tailored to that particular setting. However, the underlying principle can be generalized to arbitrary networks of unbounded maximum degree, which was the starting point for the present work.

*Contributions.* While on an intuitive level, the idea of finite-state machines equipped with additional registers might seem very natural, it does not immediately yield a formal model for distributed algorithms in arbitrary networks. In particular, it is not clear what would be the canonical way for nodes to communicate with a non-constant number of peers, if we require that they all follow the same, finitely representable set of rules.

The model we propose here, dubbed *distributed register automata*, is an attempt at a solution. As in [2,3], nodes proceed in synchronous rounds and have a fixed number of registers, which they can compare and update without having access to numeric values. The new key ingredient that allows us to formalize communication between nodes of unbounded degree is a local computing device we call *transition maker*. This is a special kind of register machine that the nodes can use to scan the states and register values of their entire neighborhood in a sequential manner. In every round, each node runs the transition maker to update its own local configuration (i.e., its state and register valuation) based on a snapshot of the local configurations of its neighbors in the previous round. A way of interpreting this is that the nodes communicate by broadcasting their local configurations as messages to their neighbors. Although the resulting model of computation is by no means universal, it allows formalizing algorithms for a wide range of problems, such as constructing a spanning tree (see Example 5) or testing whether a graph is Hamiltonian (see Example 6).

Nevertheless, our model is somewhat arbitrary, since it could be just one particular choice among many other similar definitions capturing different classes of distributed algorithms. What justifies our choice? This is where descriptive complexity comes into play. By identifying a logical formalism that has the same expressive power as distributed register automata, we provide substantial evidence for the naturalness of that model. Our formalism, referred to as *functional fixpoint logic*, is a fragment of the above-mentioned partial fixpoint logic. Like the latter, it also extends first-order logic with a partial fixpoint operator, but a weaker one that can only define unary functions instead of arbitrary relations. We show that on totally ordered graphs, this logic allows one to express precisely the properties that can be decided by distributed register automata. The connection is strongly reminiscent of Abiteboul, Vianu and Vardi's characterization of pspace, and thus contributes to the broader objective of extending classical descriptive complexity to the setting of distributed computing. Moreover, given that logical formulas are often more compact and easier to understand than abstract machines (compare Examples 6 and 8), logic could also become a useful tool in the formal specification of distributed algorithms.

The remainder of this paper is structured around our main result:

**Theorem 1.** *When restricted to finite graphs whose nodes are equipped with a total order, distributed register automata are effectively equivalent to functional fixpoint logic.*

After giving some preliminary definitions in Sect. 2, we formally introduce distributed register automata in Sect. 3 and functional fixpoint logic in Sect. 4. We then sketch the proof of Theorem 1 in Sect. 5, and conclude in Sect. 6.

#### **2 Preliminaries**

We denote the empty set by <sup>∅</sup>, the set of nonnegative integers by <sup>N</sup> <sup>=</sup> {0, <sup>1</sup>, <sup>2</sup>,... }, and the set of integers by <sup>Z</sup> <sup>=</sup> {..., <sup>−</sup>1, <sup>0</sup>, <sup>1</sup>,... }. The cardinality of any set <sup>S</sup> is written as <sup>|</sup>S<sup>|</sup> and the power set as <sup>2</sup><sup>S</sup>.

In analogy to the commonly used notation for real intervals, we define the notation [<sup>m</sup> : <sup>n</sup>] := {<sup>i</sup> <sup>∈</sup> <sup>Z</sup> <sup>|</sup> <sup>m</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>} for any m, n <sup>∈</sup> <sup>Z</sup> such that <sup>m</sup> <sup>≤</sup> <sup>n</sup>. To indicate that an endpoint is excluded, we replace the corresponding square bracket with a parenthesis, e.g., (<sup>m</sup> : <sup>n</sup>] := [<sup>m</sup> : <sup>n</sup>] \ {m}. Furthermore, if we omit the first endpoint, it defaults to 0. This gives us shorthand notations such as [n] := [0 : <sup>n</sup>] and [n) := [0 : <sup>n</sup>) = [0 : <sup>n</sup> <sup>−</sup> 1].

All graphs we consider are finite, simple, undirected, and connected. For notational convenience, we identify their nodes with nonnegative integers, which also serve as unique identifiers. That is, when we talk about the *identifier* of a node, we mean its numerical representation. A *graph* is formally represented as a pair <sup>G</sup> = (V,E), where the set <sup>V</sup> of *nodes* is equal to [n), for some integer <sup>n</sup> <sup>≥</sup> 2, and the set <sup>E</sup> consists of undirected *edges* of the form <sup>e</sup> <sup>=</sup> {u, v} ⊆ <sup>V</sup> such that <sup>u</sup> <sup>=</sup> <sup>v</sup>. Additionally, <sup>E</sup> must satisfy that every pair of nodes is connected by a sequence of edges. The restriction to graphs of size at least two is for technical reasons; it ensures that we can always encode Boolean values as nodes.

We refer the reader to [5] for standard graph theoretic terms such as *neighbor*, *degree*, *maximum degree*, *distance*, and *spanning tree*.

Graphs are used to model computer networks, where nodes correspond to processes and edges to communication links. To represent the current configuration of a system as a graph, we equip each node with some additional information: the current state of the corresponding process, taken from a nonempty finite set Q, and some pointers to other processes, modeled by a finite set R of registers.

We call Σ = (Q, R) a *signature* and define a Σ-*configuration* as a tuple C = (G, q,r), where G = (V,E) is a graph, called the *underlying* graph of C, <sup>q</sup>: <sup>V</sup> <sup>→</sup> <sup>Q</sup> is a *state function* that assigns to each node a state <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, and <sup>r</sup>: <sup>V</sup> <sup>→</sup> <sup>V</sup> <sup>R</sup> is a *register valuation function* that associates with each node a *register valuation* <sup>ρ</sup> <sup>∈</sup> <sup>V</sup> <sup>R</sup>. The set of all <sup>Σ</sup>-configurations is denoted by <sup>C</sup>(Σ). Figure <sup>1</sup> on page 6 illustrates part of a ({q1, q2, q3}, {r1, r2, r3})-configuration.

If <sup>R</sup> <sup>=</sup> <sup>∅</sup>, then we are actually dealing with a tuple (G, <sup>q</sup>), which we call a Q-*labeled graph*. Accordingly, the elements of Q may also be called *labels*. A set P of labeled graphs will be referred to as a *graph property*. Moreover, if the labels are irrelevant, we set <sup>Q</sup> equal to the singleton <sup>1</sup> := {ε}, where <sup>ε</sup> is our dummy label. In this case, we identify (G, q) with G and call it an *unlabeled* graph.

### **3 Distributed Register Automata**

Many distributed algorithms can be seen as *transducers*. A leader-election algorithm, for instance, takes as input a network and outputs the same network, but with every process storing the identifier of the unique leader in some dedicated register <sup>r</sup>. Thus, the algorithm transforms a (1, <sup>∅</sup>)-configuration into a (1, {r})-configuration. We say that it defines a (1, <sup>∅</sup>)-(1, {r})-transduction. By the same token, if we consider distributed algorithms that *decide* graph properties (e.g., whether a graph is Hamiltonian), then we are dealing with a (I, <sup>∅</sup>)-({yes, no}, <sup>∅</sup>)-transduction, where <sup>I</sup> is some set of labels. The idea is that a graph will be accepted if and only if every process eventually outputs yes.

Let us now formalize the notion of transduction. For any two signatures Σ*in* = (I,R*in*) and Σ*out* = (O, R*out*), a Σ*in-*Σ*out-transduction* is a *partial* mapping <sup>T</sup> : <sup>C</sup>(Σ*in*) <sup>→</sup> <sup>C</sup>(Σ*out*) such that, if defined, <sup>T</sup>(G, <sup>q</sup>,r)=(G, <sup>q</sup> ,r ) for some q and r . That is, a transduction does not modify the underlying graph but only the states and register valuations. We denote the set of all Σ*in*-Σ*out*-transductions by T(Σ*in*, Σ*out*) and refer to Σ*in* and Σ*out* as the *input* and *output signatures* of T. By extension, I and O are called the sets of *input* and *output labels*, and R*in* and R*out* the sets of *input* and *output registers*. Similarly, any Σ*in*-configuration C can be referred to as an *input configuration* of T and T(C) as an *output configuration*.

Next, we introduce our formal model of distributed algorithms.

**Definition 2 (Distributed register automaton).** *Let* Σ*in* = (I,R*in*) *and* Σ*out* = (O, R*out*) *be two signatures. A* distributed register automaton *(or simply* automaton*) with input signature* Σ*in and output signature* Σ*out is a tuple* A = (Q, R, ι, Δ, H, o) *consisting of a nonempty finite set* Q *of* states*, a finite set* <sup>R</sup> *of* registers *that includes both* <sup>R</sup>*in and* <sup>R</sup>*out, an* input function <sup>ι</sup>: <sup>I</sup> <sup>→</sup> <sup>Q</sup>*, a transition maker* Δ *whose specification will be given in Definition 3 below, a set* <sup>H</sup> <sup>⊆</sup> <sup>Q</sup> *of* halting states*, and an* output function <sup>o</sup>: <sup>H</sup> <sup>→</sup> <sup>O</sup>*. The registers in* <sup>R</sup> \ (R*in* <sup>∪</sup> <sup>R</sup>*out*) *are called* auxiliary registers*.*

Automaton <sup>A</sup> computes a transduction <sup>T</sup><sup>A</sup> <sup>∈</sup> <sup>T</sup>(Σ*in*, Σ*out*). To do so, it runs in a sequence of synchronous rounds on the input configuration's underlying graph G = (V,E). After each round, the automaton's global configuration is a (Q, R)-configuration C = (G, q,r), i.e., the underlying graph is always G. As mentioned before, for a node <sup>v</sup> <sup>∈</sup> <sup>V</sup> , we interpret <sup>q</sup>(v) <sup>∈</sup> <sup>Q</sup> as the current state of <sup>v</sup> and <sup>r</sup>(v) <sup>∈</sup> <sup>V</sup> <sup>R</sup> as the current register valuation of <sup>v</sup>. Abusing notation, we let C(v) := (q(v),r(v)) and say that C(v) is the *local configuration* of v. In Fig. 1, the local configuration node 17 is (q1, {r1, r2, r<sup>3</sup> <sup>→</sup> <sup>17</sup>, <sup>34</sup>, <sup>98</sup>}).

For a given input configuration <sup>C</sup> = (G, <sup>q</sup>,r) <sup>∈</sup> <sup>C</sup>(Σ*in*), the automaton's *initial configuration* is <sup>C</sup> = (G, ι ◦ <sup>q</sup>,r ), where for all <sup>v</sup> <sup>∈</sup> <sup>V</sup> , we have <sup>r</sup> (v)(r) = <sup>r</sup>(v)(r) if <sup>r</sup> <sup>∈</sup> <sup>R</sup>*in*, and <sup>r</sup> (v)(r) = <sup>v</sup> if <sup>r</sup> <sup>∈</sup> <sup>R</sup> \ <sup>R</sup>*in*. This means that every node v is initialized to state ι(q(v)), and v's initial register valuation r (v) assigns v's own identifier (provided by G) to all non-input registers while keeping the given values assigned by r(v) to the input registers.

Each subsequent configuration is obtained by running the transition maker Δ synchronously on all nodes. As we will see, Δ computes a function

$$[\![\Delta]\!\!] \colon (Q \times V^R)^+ \to Q \times V^R$$

that maps from nonempty sequences of local configurations to local configurations. This allows the automaton A to transition from a given configuration C to the next configuration <sup>C</sup> as follows. For every node <sup>u</sup> <sup>∈</sup> <sup>V</sup> of degree <sup>d</sup>, we consider the list v1,...v<sup>d</sup> of u's neighbors sorted in ascending (identifier) order, i.e., <sup>v</sup><sup>i</sup> < v<sup>i</sup>+1 for <sup>i</sup> <sup>∈</sup> [1 : <sup>d</sup>). (See Fig. <sup>1</sup> for an example, where <sup>u</sup> corresponds to node 17.) If <sup>u</sup> is already in a halting state, i.e., if <sup>C</sup>(u)=(q, ρ) <sup>∈</sup> <sup>H</sup> <sup>×</sup> <sup>V</sup> <sup>R</sup>,

**Fig. 1.** Part of a configuration, as seen by a single node. Assuming the identifiers of the nodes are the values represented in black boxes (i.e., those stored in register r1), the automaton at node 17 will update its own local configuration (q1, {r1, r2, r<sup>3</sup> -→ 17, 34, 98}) by running the transition maker on the sequence consisting of the local configurations of nodes 17, 2, 34, and 98 (in that exact order).

then its local configuration does not change anymore, i.e., C (u) = C(u). Otherwise, we define C (u) = -Δ - C(u), C(v1),...,C(vd) , which we may write more suggestively as

$$[\Delta] \colon C(u) \xrightarrow{C(v\_1), \dots, C(v\_d)} C'(u).$$

Intuitively, node u updates its own local configuration by using Δ to scan a snapshot of its neighbors' local configurations. As the system is synchronous, this update procedure is performed simultaneously by all nodes.

A configuration C = (G, q,r) is called a *halting configuration* if all nodes are in a halting state, i.e., if <sup>q</sup>(v) <sup>∈</sup> <sup>H</sup> for all <sup>v</sup> <sup>∈</sup> <sup>V</sup> . We say that A halts if it reaches a halting configuration.

The output configuration produced by a halting configuration C = (G, q,r) is the <sup>Σ</sup>*out*-configuration <sup>C</sup> = (G, o ◦ <sup>q</sup>,r ), where for all <sup>v</sup> <sup>∈</sup> <sup>V</sup> and <sup>r</sup> <sup>∈</sup> <sup>R</sup>*out*, we have r (v)(r) = r(v)(r). In other words, each node v outputs the state o(q(v)) and keeps in its output registers the values assigned by r(v).

It is now obvious that <sup>A</sup> defines a transduction <sup>T</sup><sup>A</sup> : <sup>C</sup>(Σ*in*) <sup>→</sup> <sup>C</sup>(Σ*out*). If <sup>A</sup> receives the input configuration <sup>C</sup> <sup>∈</sup> <sup>C</sup>(Σ*in*) and eventually halts and produces the output configuration <sup>C</sup> <sup>∈</sup> <sup>C</sup>(Σ*out*), then <sup>T</sup>A(C) = <sup>C</sup> . Otherwise (if A does not halt), TA(C) is undefined.

*Deciding graph properties.* Our primary objective is to use distributed register automata as decision procedures for graph properties. Therefore, we will focus on automata A that halt in a finite number of rounds on *every* input configuration, and we often restrict to input signatures of the form (I, <sup>∅</sup>) and the output signature ({yes, no}, <sup>∅</sup>). For example, for <sup>I</sup> <sup>=</sup> {a, b}, we may be interested in the set of I-labeled graphs that have exactly one a-labeled node v (the "leader"). We stipulate that A *accepts* an input configuration C with underlying graph <sup>G</sup> = (V,E) if <sup>T</sup>A(C)=(G, <sup>q</sup>,r) such that <sup>q</sup>(v) = yes for *all* <sup>v</sup> <sup>∈</sup> <sup>V</sup> . Conversely, <sup>A</sup> *rejects* <sup>C</sup> if <sup>T</sup>A(C)=(G, <sup>q</sup>,r) such that <sup>q</sup>(v) = no for *some* <sup>v</sup> <sup>∈</sup> <sup>V</sup> . This corresponds to the usual definition chosen in the emerging field of *distributed decision* [7]. Accordingly, a graph property P is *decided* by A if the automaton accepts all input configurations that satisfy P and rejects all the others.

It remains to explain how the transition maker Δ works internally.

**Definition 3 (Transition maker).** *Suppose that* A = (Q, R, ι, Δ, H, o) *is a distributed register automaton. Then its* transition maker Δ = (Q, ˜ R, ˜ ˜ι, ˜δ, o˜) *consists of a nonempty finite set* Q˜ *of* inner states*, a finite set* R˜ *of* inner registers *that is disjoint from* <sup>R</sup>*, an* inner initial state ˜<sup>ι</sup> <sup>∈</sup> <sup>Q</sup>˜*, an* inner transition function ˜<sup>δ</sup> : <sup>Q</sup>˜ <sup>×</sup> <sup>Q</sup> <sup>×</sup> <sup>2</sup>(R˜∪R)<sup>2</sup> <sup>→</sup> <sup>Q</sup>˜ <sup>×</sup> (R˜ <sup>∪</sup> <sup>R</sup>)<sup>R</sup>˜ *, and an* inner output function ˜o: <sup>Q</sup>˜ <sup>→</sup> <sup>Q</sup> <sup>×</sup> <sup>R</sup>˜<sup>R</sup>*.*

Basically, a transition maker Δ = (Q, ˜ R, ˜ ˜ι, ˜δ, o˜) is a sequential register automaton (in the spirit of [13]) that reads a nonempty sequence (q0, ρ0),...,(qd, ρd) <sup>∈</sup> (Q×<sup>V</sup> <sup>R</sup>)<sup>+</sup> of local configurations of <sup>A</sup> in order to produce a new local configuration (q , ρ ). While reading this sequence, it traverses itself a sequence (˜q0, ρ˜0),...,(˜q<sup>d</sup>+1, ρ˜<sup>d</sup>+1) of *inner configurations*, which each consist of an inner state ˜q<sup>i</sup> <sup>∈</sup> <sup>Q</sup>˜ and an *inner register valuation* <sup>ρ</sup>˜<sup>i</sup> <sup>∈</sup> (<sup>V</sup> ∪ {⊥})<sup>R</sup>˜ , where the symbol ⊥ represents an undefined value. For the initial inner configuration, we set ˜q<sup>0</sup> = ˜<sup>ι</sup> and ˜ρ0(˜r) = <sup>⊥</sup> for all ˜<sup>r</sup> <sup>∈</sup> <sup>R</sup>˜. Now for <sup>i</sup> <sup>∈</sup> [d], when <sup>Δ</sup> is in the inner configuration (˜qi, ρ˜i) and reads the local configuration (qi, ρi), it can compare all values assigned to the inner registers and registers by ˜ρ<sup>i</sup> and ρ<sup>i</sup> (with respect to the order relation on V ). In other words, it has access to the binary relation <sup>≺</sup><sup>i</sup> <sup>⊆</sup> (R˜ <sup>∪</sup> <sup>R</sup>)<sup>2</sup> such that for ˜r, <sup>s</sup>˜ <sup>∈</sup> <sup>R</sup>˜ and r, s <sup>∈</sup> <sup>R</sup>, we have ˜<sup>r</sup> <sup>≺</sup><sup>i</sup> <sup>r</sup> if and only if ˜ρi(˜r) < ρi(r), and analogously for <sup>r</sup> <sup>≺</sup><sup>i</sup> <sup>r</sup>˜, ˜<sup>r</sup> <sup>≺</sup><sup>i</sup> <sup>s</sup>˜, and <sup>r</sup> <sup>≺</sup><sup>i</sup> <sup>s</sup>. In particular, if ˜ρi(˜r) = <sup>⊥</sup>, then ˜<sup>r</sup> is incomparable with respect to <sup>≺</sup><sup>i</sup>. Equipped with this relation, <sup>Δ</sup> transitions to (˜q<sup>i</sup>+1, <sup>ρ</sup>˜<sup>i</sup>+1) by evaluating ˜δ(˜qi, qi, <sup>≺</sup><sup>i</sup>) = (˜q<sup>i</sup>+1, <sup>α</sup>˜) and computing ˜ρ<sup>i</sup>+1 such that ˜ρ<sup>i</sup>+1(˜r)=˜ρi(˜s) if ˜α(˜r)=˜s, and ˜ρ<sup>i</sup>+1(˜r) = ρi(s) if ˜α(˜r) = <sup>s</sup>, where ˜r, <sup>s</sup>˜ <sup>∈</sup> <sup>R</sup>˜ and <sup>s</sup> <sup>∈</sup> <sup>R</sup>. Finally, after having read the entire input sequence and reached the inner configuration (˜q<sup>d</sup>+1, ρ˜<sup>d</sup>+1), the transition maker outputs the local configuration (q , ρ ) such that ˜o(˜q<sup>d</sup>+1)=(q , β˜) and β˜(r)=˜r implies ρ (r)=˜ρ<sup>d</sup>+1(˜r). Here we assume without loss of generality that Δ guarantees that ρ (r) <sup>=</sup> <sup>⊥</sup> for all <sup>r</sup> <sup>∈</sup> <sup>R</sup>.

*Remark 4.* Recall that V = [n) for any graph G = (V,E) with n nodes. However, as registers cannot be compared with constants, this actually represents an arbitrary assignment of unique, totally ordered identifiers. To determine the smallest identifier (i.e., 0), the nodes can run an algorithm such as the following.

*Example 5 (Spanning tree).* We present a simple automaton A = (Q, R, ι, Δ, H, o) with input signature <sup>Σ</sup>*in* = (1, <sup>∅</sup>) and output signature <sup>Σ</sup>*out* <sup>=</sup> (1, {*parent*, *root*}) that computes a (breadth-first) spanning tree of its input



graph G = (V,E), rooted at the node with the smallest identifier. More precisely, in the computed output configuration C = (G, q,r), every node will store the identifier of its tree parent in register *parent* and the identifier of the root (i.e., the smallest identifier) in register *root*. Thus, as a side effect, A also solves the leader election problem by electing the root as the leader.

The automaton operates in three phases, which are represented by the set of states <sup>Q</sup> <sup>=</sup> {1, <sup>2</sup>, <sup>3</sup>}. A node terminates as soon as it reaches the third phase, i.e., we set <sup>H</sup> <sup>=</sup> {3}. Accordingly, the (trivial) input and output functions are <sup>ι</sup>: <sup>ε</sup> <sup>→</sup> 1 and <sup>o</sup>: 3 <sup>→</sup> <sup>ε</sup>. In addition to the output registers, each node has an auxiliary register *self* that will always store its own identifier. Thus, we choose <sup>R</sup> <sup>=</sup> {*self* , *parent*, *root*}. For the sake of simplicity, we describe the transition maker Δ in Algorithm 1 using pseudocode rules. However, it should be clear that these rules could be relatively easily implemented according to Definition 3.

All nodes start in state 1, which represents the tree-construction phase. By Rule 1, whenever an active node (i.e., a node in state 1 or 2) sees a neighbor whose *root* register contains a smaller identifier than the node's own *root* register, it updates its *parent* and *root* registers accordingly and switches to state 1. To resolve the nondeterminism in Rule 1, we stipulate that *nb* is chosen to be the neighbor with the smallest identifier among those whose *root* register contains the smallest value seen so far.

As can be easily shown by induction on the number of communication rounds, the nodes have to apply Rule 1 no more than diameter(G) times in order for the pointers in register *parent* to represent a valid spanning tree (where the root points to itself). However, since the nodes do not know when diameter(G) rounds have elapsed, they must also check that the current configuration does indeed represent a single tree, as opposed to a forest. They do so by propagating a signal, in form of state 2, from the leaves up to the root.

By Rule 2, if an active node whose neighbors all agree on the same root realizes that it is a leaf or that all of its children are in state 2, then it switches to state 2 itself. Assuming the *parent* pointers in the current configuration already represent a single tree, Rule 2 ensures that the root will eventually be notified of this fact (when all of its children are in state 2). Otherwise, the *parent* pointers represent a forest, and every tree contains at least one node that has a neighbor outside of the tree (as we assume the underlying graph is connected).

Depending on the input graph, a node can switch arbitrarily often between states 1 and 2. Once the spanning tree has been constructed and every node is in state 2, the only node that knows this is the root. In order for the algorithm to terminate, Rule 3 then makes the root broadcast an acknowledgment message down the tree, which causes all nodes to switch to the halting state 3.

Building on the automaton from Example 5, we now give an example of a graph property that can be decided in our model of distributed computing. The following automaton should be compared to the logical formula presented later in Example 8, which is much more compact and much easier to specify.

*Example 6 (Hamiltonian cycle).* We describe an automaton with input signature <sup>Σ</sup>*in* = (1, {*parent*, *root*}) and output signature <sup>Σ</sup>*out* = ({yes, no}, <sup>∅</sup>) that decides if the underlying graph G = (V,E) of its input configuration C = (G, q,r) is Hamiltonian, i.e., whether G contains a cycle that goes through each node exactly once. The automaton works under the assumption that r encodes a valid spanning tree of G in the registers *parent* and *root*, as constructed by the automaton from Example 5. Hence, by combining the two automata, we could easily construct a third one that decides the graph property of Hamiltonicity.

The automaton A = (Q, R, ι, Δ, H, o) presented here implements a simple backtracking algorithm that tries to traverse G along a Hamiltonian cycle. Its set of states is Q = - {*unvisited*, *visited*, *backtrack*}×{*idle*, *request*, *good*, *bad*} <sup>∪</sup> <sup>H</sup>, with the set of halting states <sup>H</sup> <sup>=</sup> {yes, no}. Each non-halting state consists of two components, the first one serving for the backtracking procedure and the second one for communicating in the spanning tree. The input function ι initializes every node to the state (*unvisited*, *idle*), while the output function simply returns the answers chosen by the nodes, i.e., <sup>o</sup>: yes <sup>→</sup> yes, no → no. In addition to the input registers, each node has a register *self* storing its own identifier and a register *successor* to point to its successor in a (partially constructed) Hamiltonian path. That is, <sup>R</sup> <sup>=</sup> {*self* , *parent*, *root*, *successor*}. We now describe the algorithm in an informal way. It is, in principle, easy to implement in the transition maker Δ, but a thorough formalization would be rather cumbersome.

In the first round, the root marks itself as *visited* and updates its *successor* register to point towards its smallest neighbor (the one with the smallest identifier). Similarly, in each subsequent round, any *unvisited* node that is pointed to by one of its neighbors marks itself as *visited* and points towards its smallest *unvisited* neighbor. However, if all neighbors are already *visited*, the node instead sends the *backtrack* signal to its predecessor and switches back to *unvisited* (in the following round). Whenever a *visited* node receives the *backtrack* signal from its *successor* , it tries to update its *successor* to the next-smallest *unvisited* neighbor. If no such neighbor exists, it resets its *successor* pointer to itself, propagates the *backtrack* signal to its predecessor, and becomes *unvisited* in the following round.

There is only one exception to the above rules: if a node that is adjacent to the root cannot find any *unvisited* neighbor, it chooses the root as its *successor* . This way, the constructed path becomes a cycle. In order to check whether that cycle is Hamiltonian, the root now broadcast a *request* down the spanning tree. If the *request* reaches an *unvisited* node, that node replies by sending the message *bad* towards the root. On the other hand, every *visited* leaf replies with the message *good*. While *bad* is always forwarded up to the root, *good* is only forwarded by nodes that receive this message from all of their children. If the root receives only *good*, then it knows that the current cycle is Hamiltonian and it switches to the halting state yes. The information is then broadcast through the entire graph, so that all nodes eventually accept. Otherwise, the root sends the *backtrack* signal to its predecessor, and the search for a Hamiltonian cycle continues. In case there is none (in particular, if there is not even an arbitrary cycle), the root will eventually receive the *backtrack* signal from its greatest neighbor, which indicates that all possibilities have been exhausted. If this happens, the root switches to the halting state no, and all other nodes eventually do the same.

# **4 Functional Fixpoint Logic**

In order to introduce functional fixpoint logic, we first give a definition of firstorder logic that suits our needs. Formulas will always be evaluated on *ordered*, undirected, connected, I-labeled graphs, where I is a fixed finite set of labels.

Throughout this paper, let N be an infinite supply of *node variables* and F be an infinite supply of *function variables*; we refer to them collectively as *variables*. The corresponding set of *terms* is generated by the grammar <sup>t</sup> ::= <sup>x</sup> <sup>|</sup> <sup>f</sup>(t), where <sup>x</sup> ∈ N and <sup>f</sup> ∈ F. With this, the set of *formulas* of *first-order logic* over <sup>I</sup> is given by the grammar

$$
\varphi ::= \langle a \rangle \, t \mid s < t \mid s \rightsquigarrow t \mid \neg \varphi \mid \varphi \vee \varphi \mid \exists x \, \varphi,
$$

where <sup>s</sup> and <sup>t</sup> are terms, <sup>a</sup> <sup>∈</sup> <sup>I</sup>, and <sup>x</sup> ∈ N . As usual, we may also use the additional operators ∧, ⇒, ⇔, ∀ to make our formulas more readable, and we define the notations <sup>s</sup> <sup>≤</sup> <sup>t</sup>, <sup>s</sup> <sup>=</sup> <sup>t</sup>, and <sup>s</sup> <sup>=</sup> <sup>t</sup> as abbreviations for <sup>¬</sup>(t<s), (<sup>s</sup> <sup>≤</sup> <sup>t</sup>) <sup>∧</sup> (<sup>t</sup> <sup>≤</sup> <sup>s</sup>), and <sup>¬</sup>(<sup>s</sup> <sup>=</sup> <sup>t</sup>), respectively.

The sets of *free variables* of a term t and a formula ϕ are denoted by free(t) and free(ϕ), respectively. While node variables can be bound by the usual quantifiers ∃ and ∀, function variables can be bound by a partial fixpoint operator that we will introduce below.

To interpret a formula ϕ on an I-labeled graph (G, q) with G = (V,E), we are given a *variable assignment* σ for the variables that occur freely in ϕ. This is a partial function <sup>σ</sup> : N ∪F → <sup>V</sup> <sup>∪</sup> <sup>V</sup> <sup>V</sup> such that <sup>σ</sup>(x) <sup>∈</sup> <sup>V</sup> if <sup>x</sup> is a free node variable and <sup>σ</sup>(f) <sup>∈</sup> <sup>V</sup> <sup>V</sup> if <sup>f</sup> is a free function variable. We call <sup>σ</sup>(x) and <sup>σ</sup>(f) the *interpretations* of x and f under σ, and denote them by x<sup>σ</sup> and f<sup>σ</sup>, respectively. For a composite term t, the corresponding interpretation t <sup>σ</sup> under σ is defined in the obvious way.

We write (G, <sup>q</sup>), σ <sup>|</sup><sup>=</sup> <sup>ϕ</sup> to denote that (G, <sup>q</sup>) *satisfies* <sup>ϕ</sup> under assignment <sup>σ</sup>. If <sup>ϕ</sup> does not contain any free variables, we simply write (G, <sup>q</sup>) <sup>|</sup><sup>=</sup> <sup>ϕ</sup> and refer to the set P of I-labeled graphs that satisfy ϕ as the graph property *defined* by ϕ. Naturally enough, we say that two devices (i.e., automata or formulas) are *equivalent* if they specify (i.e., decide or define) the same graph property and that two classes of devices are equivalent if their members specify the same class of graph properties.

As we assume that the reader is familiar with first-order logic, we only define the semantics of the atomic formulas (whose syntax is not completely standard):

(G, <sup>q</sup>), σ <sup>|</sup><sup>=</sup> a<sup>t</sup> iff <sup>q</sup>(<sup>t</sup> <sup>σ</sup>) = a ("t has label a"), (G, <sup>q</sup>), σ <sup>|</sup><sup>=</sup> s<t iff <sup>s</sup><sup>σ</sup> < t<sup>σ</sup> ("<sup>s</sup> is smaller than <sup>t</sup>"), (G, <sup>q</sup>), σ <sup>|</sup><sup>=</sup> <sup>s</sup> <sup>t</sup> iff {s<sup>σ</sup>, t<sup>σ</sup>} ∈ <sup>E</sup> ("<sup>s</sup> and <sup>t</sup> are adjacent").

We now turn to *functional fixpoint logic*. Syntactically, it is defined as the extension of first-order logic that allows us to write formulas of the form

$$\text{pfp}\begin{bmatrix}f\_1 \colon \varphi\_1(f\_1, \dots, f\_\ell, \text{IN,OUT})\\ \vdots\\ f\_\ell \colon \varphi\_\ell(f\_1, \dots, f\_\ell, \text{IN,OUT})\end{bmatrix} \psi \,,\tag{\*}$$

where <sup>f</sup>1,...,f ∈ F, in, out ∈ N , and <sup>ϕ</sup>1,...,ϕ, ψ are formulas. We use the notation "ϕi(f1,...,f, in, out)" to emphasize that f1,...,f, in, out may occur freely in ϕ<sup>i</sup> (possibly among other variables). The free variables of formula (∗) are given by <sup>i</sup>∈(] free(ϕi) \ {f1,...,f, in, out} ∪ free(ψ) \ {f1,...,f} .

The idea is that the *partial fixpoint operator* pfp binds the function variables f1,...,f. The lines in square brackets constitute a system of function definitions that provide an interpretation of f1,...,f, using the special node variables in and out as helpers to represent input and output values. This is why pfp also binds any free occurrences of in and out in ϕ1,...,ϕ, but not in ψ.

To specify the semantics of (∗), we first need to make some preliminary observations. As before, we consider a fixed I-labeled graph (G, q) with G = (V,E) and assume that we are given a variable assignment σ for the free variables of (∗). With respect to (G, <sup>q</sup>) and <sup>σ</sup>, each formula <sup>ϕ</sup><sup>i</sup> induces an operator <sup>F</sup><sup>ϕ</sup><sup>i</sup> : (<sup>V</sup> <sup>V</sup> ) <sup>→</sup> <sup>V</sup> <sup>V</sup> that takes some interpretation of the function variables f1,...,f and outputs a new interpretation of fi, corresponding to the function graph defined by ϕ<sup>i</sup> via the node variables in and out. For inputs on which ϕ<sup>i</sup> does not define a functional relationship, the new interpretation of f<sup>i</sup> behaves like the identity function. More formally, given a variable assignment ˆσ that extends σ with interpretations of f1,...,f, the operator F<sup>ϕ</sup><sup>i</sup> maps f<sup>σ</sup><sup>ˆ</sup> <sup>1</sup> ,...,f<sup>σ</sup><sup>ˆ</sup> to the function f new <sup>i</sup> such that for all <sup>u</sup> <sup>∈</sup> <sup>V</sup> ,

$$f\_i^{\text{new}}(u) = \begin{cases} v & \text{if } v \text{ is the unique node in } V \text{ s.t. } (G, \mathfrak{q}), \hat{\sigma}[\text{IN}, \text{OUT} \mapsto u, v] \models \varphi\_i, \\ u & \text{otherwise.} \end{cases}$$

Here, ˆσ[in, out <sup>→</sup> u, v] is the extension of ˆ<sup>σ</sup> interpreting in as <sup>u</sup> and out as <sup>v</sup>.

In this way, the operators Fϕ<sup>1</sup> ,...,Fϕ give rise to an infinite sequence (f <sup>k</sup> <sup>1</sup> ,...,f <sup>k</sup> )k≥<sup>0</sup> of tuples of functions, called *stages*, where the initial stage contains solely the identity function id<sup>V</sup> and each subsequent stage is obtained from its predecessor by componentwise application of the operators. More formally,

$$f\_i^0 = \text{id}\_V = \{ u \mapsto u \mid u \in V \} \qquad \text{and} \qquad f\_i^{k+1} = F\_{\varphi\_i}(f\_1^k, \dots, f\_\ell^k),$$

for <sup>i</sup> <sup>∈</sup> ( ] and <sup>k</sup> <sup>≥</sup> 0. Now, since we have not imposed any restrictions on the formulas ϕi, this sequence might never stabilize, i.e, it is possible that (f <sup>k</sup> <sup>1</sup> ,...,f <sup>k</sup> ) = (<sup>f</sup> <sup>k</sup>+1 <sup>1</sup> ,...,f <sup>k</sup>+1 ) for all <sup>k</sup> <sup>≥</sup> 0. Otherwise, the sequence reaches a (simultaneous) fixpoint at some position <sup>k</sup> no greater than <sup>|</sup><sup>V</sup> <sup>|</sup> <sup>|</sup><sup>V</sup> |· (the number of -tuples of functions on V ).

We define the *partial fixpoint* (f<sup>∞</sup> <sup>1</sup> ,...,f<sup>∞</sup> ) of the operators F<sup>ϕ</sup><sup>1</sup> ,...,F<sup>ϕ</sup> to be the reached fixpoint if it exists, and the tuple of identity functions otherwise. That is, for <sup>i</sup> <sup>∈</sup> ( ],

$$f\_i^{\infty} = \begin{cases} f\_i^k & \text{if there exists } k \ge 0 \text{ such that } f\_j^k = f\_j^{k+1} \text{ for all } j \in (\ell],\\ \text{id}\_V & \text{otherwise.} \end{cases}$$

Having introduced the necessary background, we can finally provide the semantics of the formula pfp[f<sup>i</sup> : <sup>ϕ</sup>i]<sup>i</sup>∈(] <sup>ψ</sup> presented in (∗):

$$[(G, \mathfrak{q}), \sigma \vdash \text{pfp}[f\_i \colon \varphi\_i]\_{i \in \langle \ell \rangle} \psi \qquad \text{iff} \qquad (G, \mathfrak{q}), \sigma[f\_i \mapsto f\_i^{\infty}]\_{i \in \langle \ell \rangle} \vdash \psi,$$

where <sup>σ</sup>[f<sup>i</sup> <sup>→</sup> <sup>f</sup><sup>∞</sup> <sup>i</sup> ]<sup>i</sup>∈(] is the extension of <sup>σ</sup> that interprets <sup>f</sup><sup>i</sup> as <sup>f</sup><sup>∞</sup> <sup>i</sup> , for <sup>i</sup> <sup>∈</sup> ( ]. In other words, the formula pfp[f<sup>i</sup> : <sup>ϕ</sup>i]<sup>i</sup>∈(] <sup>ψ</sup> can intuitively be read as

"if f1,...,f are interpreted as the partial fixpoint of ϕ1,...,ϕ, then ψ holds".

#### **Syntactic Sugar**

Before we consider a concrete formula (in Example 8), we first introduce some "syntactic sugar" to make using functional fixpoint logic more pleasant.

*Set variables.* According to our definition of functional fixpoint logic, the operator pfp can bind only function variables. However, functions can be used to encode sets of nodes in a straightforward manner: any set U may be represented by a function that maps nodes outside of U to themselves and nodes inside U to nodes distinct from themselves. Therefore, we may fix an infinite supply S of *set variables*, and extend the syntax of first-order logic to allow atomic formulas of the form <sup>t</sup> <sup>∈</sup> <sup>X</sup>, where <sup>t</sup> is a term and <sup>X</sup> is a set variable in <sup>S</sup>. Naturally, the semantics is that "t is an element of X". To bind set variables, we can then write partial fixpoint formulas of the form pfp (f<sup>i</sup> : <sup>ϕ</sup>i)<sup>i</sup>∈(],(X<sup>i</sup> : <sup>ϑ</sup>i)<sup>i</sup>∈(m] ψ, where <sup>f</sup>1,...,f ∈ F, <sup>X</sup>1,...,X<sup>m</sup> ∈ S, and <sup>ϕ</sup>1,...,ϕ, ϑ1,...,ϑm, ψ are formulas. The stages of the partial fixpoint induction are computed as before, but each set variable <sup>X</sup><sup>i</sup> is initialized to <sup>∅</sup>, and falls back to <sup>∅</sup> in case the sequence of stages does not converge to a fixpoint.

*Quantifiers over functions and sets.* Partial fixpoint inductions allow us to iterate over various interpretations of function and set variables and thus provide a way of expressing (second-order) quantification over functions and sets. Since we restrict ourselves to graphs whose nodes are totally ordered, we can easily define a suitable order of iteration and a corresponding partial fixpoint induction that traverses all possible interpretations of a given function or set variable. To make this more convenient, we enrich the language of functional fixpoint logic with second-order quantifiers, allowing us to write formulas of the form <sup>∃</sup>f ϕ and <sup>∃</sup>X ϕ, where <sup>f</sup> ∈ F, <sup>X</sup> ∈ S, and <sup>ϕ</sup> is a formula. Obviously, the semantics is that "there exists a function f, or a set X, respectively, such that ϕ holds".

As a consequence, it is possible to express any graph property definable in *monadic second-order logic*, the extension of first-order logic with set quantifiers.

**Corollary 7.** *When restricted to finite graphs equipped with a total order, functional fixpoint logic is strictly more expressive than monadic second-order logic.*

The strictness of the inclusion in the above corollary follows from the fact that even on totally ordered graphs, Hamiltonicity cannot be defined in monadic second-order logic (see, e.g., the proof in [4, Prp. 5.13]). As the following example shows, this property is easy to express in functional fixpoint logic.

*Example 8 (Hamiltonian cycle).* The following formula of functional fixpoint logic defines the graph property of Hamiltonicity. That is, an unlabeled graph G satisfies this formula if and only if there exists a cycle in G that goes through each node exactly once.

$$\exists f \left[ \begin{aligned} & \forall x \big( f(x) \rightsquigarrow x \big) \land \forall x \big( \exists y \big[ f(y) = x \land \forall z \big( f(z) = x \Rightarrow z = y \big) \big] \land \\ & \forall X \Big( \big[ \exists x (x \in X) \land \forall y \big( y \in X \Rightarrow f(y) \in X \big) \big] \Rightarrow \forall y (y \in X) \Big) \end{aligned} \right]$$

Here, x, y, z ∈ N , <sup>X</sup> ∈ S, and <sup>f</sup> ∈ F. Intuitively, we represent a given Hamiltonian cycle by a function f that tells us for each node x, which of x's neighbors we should visit next in order to traverse the entire cycle. Thus, f actually represents a directed version of the cycle.

To ensure the existence of a Hamiltonian cycle, our formula states that there is a function f satisfying the following two conditions. By the first line, each node x must have exactly one f-predecessor and one f-successor, both of which must be neighbors of x. By the second line, if we start at any node x and collect into a set X all the nodes reachable from x (by following the path specified by <sup>f</sup>), then <sup>X</sup> must contain all nodes.

#### **5 Translating Between Automata and Logic**

Having introduced both automata and logic, we can proceed to explain the first part of Theorem 1 (stated in Sect. 1), i.e., how distributed register automata can be translated into functional fixpoint logic.

**Proposition 9.** *For every distributed register automaton that decides a graph property, we can construct an equivalent formula of functional fixpoint logic.*

*Proof (sketch).* Given a distributed register automaton A = (Q, R, ι, Δ, H, o) deciding a graph property P over label set I, we can construct a formula ϕ<sup>A</sup> of functional fixpoint logic that defines <sup>P</sup>. For each state <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, our formula uses a set variable X<sup>q</sup> to represent the set of nodes of the input graph that are in state <sup>q</sup>. Also, for each register <sup>r</sup> <sup>∈</sup> <sup>R</sup>, it uses a function variable <sup>f</sup><sup>r</sup> to represent the function that maps each node u to the node v whose identifier is stored in u's register r. By means of a partial fixpoint operator, we enforce that on any <sup>I</sup>-labeled graph (G, <sup>q</sup>), the final interpretations of (Xq)<sup>q</sup>∈<sup>Q</sup> and (fr)<sup>r</sup>∈<sup>R</sup> represent the halting configuration reached by A on (G, q). The main formula is simply

$$\varphi\_A := \text{pfp}\begin{bmatrix} (X\_q \colon \varphi\_q)\_{q \in Q} \\ (f\_r \colon \varphi\_r)\_{r \in R} \end{bmatrix} \forall x \Big(\bigvee\_{p \in H \colon o(p) = \text{vES}} X\_p\Big),$$

which states that all nodes end up in a halting state that outputs yes.

Basically, the subformulas (ϕq)<sup>q</sup>∈<sup>Q</sup> and (ϕr)<sup>r</sup>∈<sup>R</sup> can be constructed in such a way that for all <sup>i</sup> <sup>∈</sup> <sup>N</sup>, the (<sup>i</sup> + 1)-th stage of the partial fixpoint induction represents the configuration reached by A in the i-th round. To achieve this, each of the subformulas contains a nested partial fixpoint formula describing the result computed by the transition maker Δ between two consecutive synchronous rounds, using additional set and function variables to encode the inner configurations of Δ at each node. Thus, each stage of the nested partial fixpoint induction corresponds to a single step in the transition maker's sequential scanning process.

Let us now consider the opposite direction and sketch how to go from functional fixpoint logic to distributed register automata.

**Proposition 10.** *For every formula of functional fixpoint logic that defines a graph property, we can construct an equivalent distributed register automaton.*

*Proof (sketch).* We proceed by structural induction: each subformula ϕ will be evaluated by a dedicated automaton Aϕ, and several such automata can then be combined to build an automaton for a composite formula. For this purpose, it is convenient to design *centralized* automata, which operate on a givenspanning tree (as computed in Example 5) and are coordinated by the root in a fairly sequential manner. In Aϕ, each free node variable x of ϕ is represented by a corresponding input register x whose value at the root is the current interpretation x<sup>σ</sup> of x. Similarly, to represent a function variable f, every node v has a register f storing f<sup>σ</sup>(v). The nodes also possess some auxiliary registers whose purpose will be explained below. In the end, for any formula ϕ (potentially with free variables), we will have an automaton A<sup>ϕ</sup> computing a transduction <sup>T</sup><sup>A</sup><sup>ϕ</sup> : <sup>C</sup>(I, {*parent*, *root*} ∪ free(ϕ)) <sup>→</sup> <sup>C</sup>({yes, no}, <sup>∅</sup>), where *parent* and *root* are supposed to constitute a spanning tree. The computation is triggered by the root, which means that the other nodes are waiting for a signal to wake up. **Algorithm 2.** <sup>A</sup><sup>ϕ</sup> for <sup>ϕ</sup> = pfp[f<sup>i</sup> : <sup>ϕ</sup>i]i∈[1 : ] <sup>ψ</sup>, as controlled by the root

1 init(Ainc) 2 repeat 3 @every node do for i ∈ [1 : -] do <sup>f</sup><sup>i</sup> <sup>←</sup> <sup>f</sup> new i 4 for i ∈ [1 : -] do *update*(f new <sup>i</sup> ) 5 if @every node (∀i ∈ [1 : -] : f new <sup>i</sup> = fi) then goto 8 6 until execute(Ainc) returns no /∗ until global counter at maximum ∗/ 7 @every node do for i ∈ [1 : -] do <sup>f</sup><sup>i</sup> <sup>←</sup> *self* 8 execute(Aψ)

Essentially, the nodes involved in the evaluation of ϕ collect some information, send it towards the root, and go back to sleep. The root then returns yes or no, depending on whether or not ϕ holds in the input graph under the variable assignment provided by the input registers. Centralizing A<sup>ϕ</sup> in that way makes it very convenient (albeit not efficient) to evaluate composite formulas. For example, in <sup>A</sup><sup>ϕ</sup>∨<sup>ψ</sup>, the root will first run <sup>A</sup>ϕ, and then <sup>A</sup><sup>ψ</sup> in case <sup>A</sup><sup>ϕ</sup> returns no.

The evaluation of atomic formulas is straightforward. So let us focus on the most interesting case, namely when <sup>ϕ</sup> = pfp[f<sup>i</sup> : <sup>ϕ</sup>i]<sup>i</sup>∈(] <sup>ψ</sup>. The root's program is outlined in Algorithm 2. Line 1 initializes a counter that ranges from 0 to <sup>n</sup>n−1, where n is the number of nodes in the input graph. This counter is distributed in the sense that every node has some dedicated registers that together store the current counter value. Every execution of Ainc will increment the counter by 1, or return no if its maximum value has been exceeded. Now, in each iteration of the loop starting at Line 2, all registers f<sup>i</sup> and f new <sup>i</sup> are updated in such a way that they represent the current and next stage, respectively, of the partial fixpoint induction. For the former, it suffices that every node copies, for all i, the contents of f new <sup>i</sup> to f<sup>i</sup> (Line 3). To update f new <sup>i</sup> , Line 4 calls a subroutine *update*(f new <sup>i</sup> ) whose effect is that f new <sup>i</sup> <sup>=</sup> <sup>F</sup><sup>ϕ</sup><sup>i</sup> ((fi)<sup>i</sup>∈(]) for all <sup>i</sup>, where <sup>F</sup><sup>ϕ</sup><sup>i</sup> : (<sup>V</sup> <sup>V</sup> ) <sup>→</sup> <sup>V</sup> <sup>V</sup> is the operator defined in Sect. 4. Line 5 checks whether we have reached a fixpoint: The root asks every node to compare, for all i, its registers f new <sup>i</sup> and fi. The corresponding truth value is propagated back to the root, where *false* is given preference over *true*. If the result is *true*, we exit the loop and proceed with calling A<sup>ψ</sup> to evaluate ψ (Line 8). Otherwise, we try to increment the global counter by executing Ainc (Line 6). If the latter returns no, the fixpoint computation is aborted because we know that it has reached a cycle. In accordance with the partial fixpoint semantics, all nodes then write their own identifier to every register <sup>f</sup><sup>i</sup> (Line 7) before <sup>ψ</sup> is evaluated (Line 8).

#### **6 Conclusion**

This paper makes some progress in the development of a descriptive distributed complexity theory by establishing a logical characterization of a wide class of network algorithms, modeled as distributed register automata.

In our translation from logic to automata, we did not pay much attention to algorithmic efficiency. In particular, we made extensive use of centralized subroutines that are triggered and controlled by a leader process. A natural question for future research is to identify cases where we can understand a distributed architecture as an opportunity that allows us to evaluate formulas faster. In other words, is there an expressive fragment of functional fixpoint logic that gives rise to efficient distributed algorithms in terms of running time? What about the required number of messages? We are then entering the field of automatic *synthesis of practical distributed algorithms* from logical specifications. This is a worthwhile task, as it is often much easier to declare what should be done than how it should be done (cf. Examples 6 and 8).

As far as the authors are aware, this area is still relatively unexplored. However, one noteworthy advance was made by Grumbach and Wu in [9], where they investigated distributed evaluation of first-order formulas on bounded-degree graphs and planar graphs. We hope to follow up on this in future work.

**Acknowledgments.** We thank Matthias F¨ugger for helpful discussions. Work supported by ERC *EQualIS* (FP7-308087) (http://www.lsv.fr/∼bouyer/equalis) and ANR *FREDDA* (17-CE40-0013) (https://www.irif.fr/anr/fredda/index).

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **The Impatient May Use Limited Optimism to Minimize Regret**

Micha¨el Cadilhac<sup>1</sup>, Guillermo A. P´erez2(B), and Marie van den Bogaard<sup>3</sup>

<sup>1</sup> University of Oxford, Oxford, UK michael@cadilhac.name <sup>2</sup> University of Antwerp, Antwerp, Belgium guillermoalberto.perez@uantwerpen.be <sup>3</sup> Universit´e libre de Bruxelles, Brussels, Belgium marie.van.den.bogaard@ulb.ac.be

**Abstract.** Discounted-sum games provide a formal model for the study of reinforcement learning, where the agent is enticed to get rewards early since later rewards are discounted. When the agent interacts with the environment, she may realize that, with hindsight, she could have increased her reward by playing differently: this difference in outcomes constitutes her *regret value*. The agent may thus elect to follow a *regretminimal* strategy. In this paper, it is shown that (1) there always exist regret-minimal strategies that are admissible—a strategy being inadmissible if there is another strategy that always performs better; (2) computing the minimum possible regret or checking that a strategy is regret-minimal can be done in coNPNP, disregarding the computational cost of numerical analysis (otherwise, this bound becomes PSpace).

**Keywords:** Admissibility · Discounted-sum games · Regret minimization

# **1 Introduction**

A pervasive model used to study the strategies of an agent in an unknown environment is *two-player infinite horizon games played on finite weighted graphs*. Therein, the set of vertices of a graph is split between two players, Adam and Eve, playing the roles of the environment and the agent, respectively. The play starts in a given vertex, and each player decides where to go next when the play reaches one of their vertices. Questions asked about these games are usually of the form: *Does there exist a strategy of Eve such that...?* For such a question to be well-formed, one should provide:


The valuation function can be Boolean, in which case one says that Eve *wins* or *loses* (one very classical example has Eve winning if the maximum value appearing infinitely often along the edges is even). In this setting, it is often assumed that Adam is adversarial, and the question then becomes: *Can Eve always win?* (The names of the players stem from this view: *is there* a strategy of ∃ve that *always* beats ∀dam?) The literature on that subject spans more than 35 years, with newly found applications to this day (see [4] for comprehensive lecture notes, and [7] for an example of recent use in the analysis of attacks in cryptocurrencies).

The valuation function can also aggregate the numerical values along the edges into a reward value. We focus in this paper on *discounted sum*: if w is the weight of the edge taken at the <sup>n</sup>-th step, Eve's reward grows by <sup>λ</sup><sup>n</sup> · <sup>w</sup>, where λ ∈ (0, 1) is a prescribed discount factor. Discounting future rewards is a classical notion used in economics [18], Markov decision processes [9,16], systems theory [1], and is at the heart of Q-learning, a reinforcement learning technique widely used in machine learning [19]. In this setting, we consider three attitudes towards the environment:


In this paper, we single out a class of strategies for Eve that first follow a best-case optimal strategy, then switch to a worst-case optimal strategy after some precise time; we call these strategies *optipess*. Our main contributions are then:

1. Optipess strategies are not only regret-minimal (a fact established in [13]) but also admissible—note that there are regret-minimal strategies that are not admissible and *vice versa*. On the way, we show that for any strategy of Eve there is an admissible strategy that performs at least as well; this is a peculiarity of discounted-sum games.


*Structure of the Paper.* Notations and definitions are introduced in Sect. 2. The study of admissibility appears in Sect. 3, and is independent from the complexity analysis of regret. The main algorithm devised in this paper (point 2 above) is presented in Theorem 5, Sect. 6; it relies on technical lemmas that are the focus of Sects. 4 and 5. We encourage the reader to go through the statements of the lemma sections, then through the proof of Theorem 5, to get a good sense of the role each lemma plays.

In more details, in Sect. 4 we provide a crucial lemma that allows to represent long paths succinctly, and in Sect. 5, we argue that the important values of a game (regret, best-case, worst-case) have short witnesses. In Sect. 6, we use these lemmas to devise our algorithms.

# **2 Preliminaries**

We assume familiarity with basic graph and complexity theory. Some more specific definitions and known results are recalled here.

*Game, Play, History.* A *(discounted-sum) game* G is a tuple (V,v0, V∃, E, w, λ) where V is a finite set of vertices, v<sup>0</sup> is the starting vertex, V<sup>∃</sup> ⊆ V is the subset of vertices that belong to Eve, <sup>E</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> is a set of directed edges, <sup>w</sup>: <sup>E</sup> <sup>→</sup> <sup>Z</sup> is an (edge-)weight function, and 0 <λ< 1 is a rational *discount factor*. The vertices in V \ V<sup>∃</sup> are said to belong to Adam. Since we consider games played for an infinite number of turns, we will always assume that every vertex has at least one outgoing edge.

<sup>A</sup> *play* is an infinite path <sup>v</sup>1v<sup>2</sup> ··· ∈ <sup>V</sup> <sup>ω</sup> in the digraph (V,E). A *history* h = v<sup>1</sup> ··· v<sup>n</sup> is a finite path. The *length of* h, written |h|, is the number of *edges* it contains: <sup>|</sup>h<sup>|</sup> def = n − 1. The set **Hist** consists of all histories that start in v<sup>0</sup> and end in a vertex from V∃.

*Strategies.* A *strategy of Eve* in G is a function σ that maps histories ending in some vertex v ∈ V<sup>∃</sup> to a neighbouring vertex v (i.e., (v, v ) ∈ E). The strategy σ is *positional* if for all histories h, h ending in the same vertex, σ(h) = σ(h ). *Strategies of Adam* are defined similarly.

A history h = v<sup>1</sup> ··· v<sup>n</sup> is said to be *consistent with a strategy* σ of Eve if for all i ≥ 2 such that v<sup>i</sup> ∈ V∃, we have that σ(v<sup>1</sup> ··· vi−<sup>1</sup>) = vi. Consistency with strategies of Adam is defined similarly. We write **Hist**(σ) for the set of histories in **Hist** that are consistent with σ. A play is consistent with a strategy (of either player) if all its prefixes are consistent with it.

Given a vertex v and both Adam and Eve's strategies, τ and σ respectively, there is a unique play starting in v that is consistent with both, called the *outcome* of τ and σ on v. This play is denoted **out**<sup>v</sup>(σ, τ ).

For a strategy σ of Eve and a history h ∈ **Hist**(σ), we let σ<sup>h</sup> be the strategy of Eve that assumes h has already been played. Formally, σh(h ) = σ(h · h ) for any history h (we will use this notation only on histories h that start with the ending vertex of h).

*Values.* The *value of a history* h = v<sup>1</sup> ··· v<sup>n</sup> is the discounted sum of the weights on the edges:

$$\mathbf{Val}(h) \stackrel{\text{def}}{=} \sum\_{i=0}^{|h|-1} \lambda^i w(v\_i, v\_{i+1}) \ .$$

The *value of a play* is simply the limit of the values of its prefixes.

The *antagonistic value* of a strategy σ of Eve with history h = v<sup>1</sup> ··· v<sup>n</sup> is the value Eve achieves when Adam tries to hinder her, after h:

$$\mathbf{a}\mathbf{V}\mathbf{a}\mathbf{l}^h(\sigma) \stackrel{\text{def}}{=} \mathbf{V}\mathbf{a}\mathbf{l}(h) + \lambda^{|h|} \cdot \inf\_{\tau} \mathbf{V}\mathbf{a}(\mathbf{out}^{v\_n}(\sigma\_h, \tau)) \ \ ,$$

where τ ranges over all strategies of Adam. The *collaborative value* **cVal**<sup>h</sup>(σ) is defined in a similar way, by substituting "sup" for "inf." We write **aVal**<sup>h</sup> (resp. **cVal**<sup>h</sup>) for the best antagonistic (resp. collaborative) value achievable by Eve with any strategy.

*Types of Strategies.* A strategy σ of Eve is *strongly worst-case optimal* (SWO) if for every history h we have **aVal**<sup>h</sup>(σ) = **aVal**<sup>h</sup>; it is *strongly best-case optimal* (SBO) if for every history h we have **cVal**<sup>h</sup>(σ) = **cVal**<sup>h</sup>.

We single out a class of SWO strategies that perform well if Adam turns out to be helping. A SWO strategy σ of Eve is *strongly best worst-case optimal* (SBWO) if for every history h we have **cVal**<sup>h</sup>(σ) = **acVal**<sup>h</sup>, where:

$$\mathbf{a}\mathbf{c}\mathbf{V}\mathbf{a}\mathbf{l}^h \stackrel{\text{def}}{=} \sup\{\mathbf{c}\mathbf{V}\mathbf{a}\mathbf{l}^h(\sigma') \mid \sigma' \text{ is a SWO strategy of Eve}\}\dots$$

In the context of discounted-sum games, strategies that are positional and strongly optimal always exist. Furthermore, the set of all such strategies can be characterized by local conditions.

**Lemma 1 (Follows from [20, Theorem 5.1]).** *There exist positional SWO, SBO, and SBWO strategies in every game. For any positional strategy* σ *of Eve:*

$$\begin{array}{l} \text{- } (\forall v \in V) \left[ \mathbf{a} \mathbf{V} \mathbf{al}^{v} (\sigma) = \mathbf{a} \mathbf{V} \mathbf{al}^{v} \right] \text{ iff } \sigma \text{ is } SWO;\\ \mathbf{-} (\forall v \in V) \left[ \mathbf{c} \mathbf{V} \mathbf{al}^{v} (\sigma) = \mathbf{c} \mathbf{V} \mathbf{al}^{v} \right] \text{ iff } \sigma \text{ is } SBO;\\ \mathbf{-} (\forall v \in V) \left[ \mathbf{a} \mathbf{V} \mathbf{al}^{v} (\sigma) = \mathbf{a} \mathbf{V} \mathbf{al}^{v} \wedge \mathbf{c} \mathbf{V} \mathbf{al}^{v} (\sigma) = \mathbf{a} \mathbf{c} \mathbf{V} \mathbf{al}^{v} \right] \text{ iff } \sigma \text{ is } SBWO. \end{array}$$

*Regret.* The *regret* of a strategy σ of Eve is the maximal difference between the value obtained by using σ and the value obtained by using an alternative strategy:

$$\mathbf{Reg}\left(\sigma\right) \stackrel{\text{def}}{=} \sup\_{\tau} \left( \left( \sup\_{\sigma'} \mathbf{Val}(\mathbf{out}^{v\_0}(\sigma', \tau)) \right) - \mathbf{Val}(\mathbf{out}^{v\_0}(\sigma, \tau)) \right) \,,$$

where τ and σ range over all strategies of Adam and Eve, respectively. The *(minimal) regret of* <sup>G</sup> is then **Reg** def = inf<sup>σ</sup> **Reg** (σ).

Regret can also be characterized by considering the point in history when Eve should have done things differently. Formally, for any vertices u and v let **cVal**<sup>u</sup> <sup>¬</sup><sup>v</sup> be the maximal **cVal**<sup>u</sup>(σ) for strategies <sup>σ</sup> verifying <sup>σ</sup>(u) = v. Then:

**Lemma 2 ([13, Lemma 13]).** *For all strategies* σ *of Eve:*

$$\mathbf{Reg}\left(\sigma\right) = \sup \left\{ \lambda^n \left( \mathbf{c} \mathbf{Val}\_{\neg\sigma\left(h\right)}^{v\_n} - \mathbf{aVal}^{v\_n}(\sigma\_h) \right) \, \middle| \, h = v\_0 \cdots v\_n \in \mathbf{Hist}(\sigma) \right\} \,\,\dots$$

*Switching and Optipess Strategies.* Given strategies σ1, σ<sup>2</sup> of Eve and a *threshold function* <sup>t</sup>: <sup>V</sup><sup>∃</sup> <sup>→</sup> <sup>N</sup>∪{∞}, we define the *switching strategy* <sup>σ</sup><sup>1</sup> t →σ<sup>2</sup> for any history h = v<sup>1</sup> ··· v<sup>n</sup> ending in V<sup>∃</sup> as:

$$
\sigma\_1 \xrightarrow{t} \sigma\_2(h) = \begin{cases}
\sigma\_2(h) & \text{if } (\exists i)[i \ge t(v\_i)], \\
\sigma\_1(h) & \text{otherwise.}
\end{cases}
$$

We refer to histories for which the first condition above holds as *switched histories*, to all others as *unswitched histories*. The strategy σ = σ<sup>1</sup> t →σ<sup>2</sup> is said to be *bipositional* if both σ<sup>1</sup> and σ<sup>2</sup> are positional. Note that in that case, for all histories h, if h is switched then σ<sup>h</sup> = σ2, and otherwise σ<sup>h</sup> is the same as σ but with t(v) changed to max{0, t(v) − |h|} for all v ∈ V∃. In particular, if |h| is greater than max{t(v) < ∞}, then σ<sup>h</sup> is nearly positional: it switches to σ<sup>2</sup> as soon as it sees a vertex with t(v) = ∞.

A strategy σ is *perfectly optimistic-then-pessimistic* (optipess, for short) if there are positional SBO and SBWO strategies σsbo and σsbwo such that σ = σsbo <sup>t</sup> <sup>→</sup>σsbwo where <sup>t</sup>(v) = inf <sup>i</sup> <sup>∈</sup> <sup>N</sup> <sup>λ</sup><sup>i</sup> (**cVal**<sup>v</sup> <sup>−</sup> **aVal**<sup>v</sup>) <sup>≤</sup> **Reg** .

**Theorem 1 ([13]).** *For all optipess strategies* σ *of Eve,* **Reg** (σ) = **Reg***.*

*Conventions. As we have done so far, we will assume throughout the paper that a game* G *is fixed—with the notable exception of the results on complexity, in which we assume that the game is given with all numbers in binary. Regarding strategies, we assume that bipositional strategies are given as two positional strategies and a threshold function encoded as a table with binary-encoded entries.*

*Example 1.* Consider the following game, where round vertices are owned by Eve, and square ones by Adam. The double edges represent Eve's positional strategy σ:

Eve's strategy has a regret value of 2λ<sup>2</sup>/(1 <sup>−</sup> <sup>λ</sup>). This is realized when Adam plays from v<sup>0</sup> to v1, from v <sup>1</sup> to x, and from v <sup>1</sup> to y. Against that strategy, Eve ensures a discounted-sum value of 0 by playing according to σ while regretting not having played to v <sup>1</sup> to obtain 2λ<sup>2</sup>/(1 <sup>−</sup> <sup>λ</sup>). -

#### **3 Admissible Strategies and Regret**

There is no reason for Eve to choose a strategy that is consistently worse than another one. This classical idea is formalized using the notions of *strategy domination* and *admissible strategies*. In this section, which is independent from the rest of the paper, we study the relation between admissible and regret-minimal strategies. Let us start by formally introducing the relevant notions:

**Definition 1.** *Let* σ1, σ<sup>2</sup> *be two strategies of Eve. We say that* σ<sup>1</sup> *is* weakly dominated *by* <sup>σ</sup><sup>2</sup> *if* **Val**(**out**<sup>v</sup><sup>0</sup> (σ1, τ )) <sup>≤</sup> **Val**(**out**<sup>v</sup><sup>0</sup> (σ2, τ )) *for every strategy* <sup>τ</sup> *of Adam. We say that* σ<sup>1</sup> *is* dominated *by* σ<sup>2</sup> *if* σ<sup>1</sup> *is* weakly dominated *by* σ<sup>2</sup> *but not conversely. A strategy* σ *of Eve is* admissible *if it is not dominated by any other strategy.*

In other words, admissible strategies are maximal elements for the weakdomination pre-order.

*Example 2.* Consider the following game, where the strategy σ of Eve is shown by the double edges:

This strategy guarantees a discounted-sum value of 6λ<sup>2</sup>(1−λ) against any strategy of Adam. Furthermore, it is worst-case optimal since playing to v<sup>1</sup> instead of v<sup>2</sup> would allow Adam the opportunity to ensure a strictly smaller value by playing to v <sup>1</sup> . The latter also implies that σ is admissible. Interestingly, playing to v<sup>1</sup> is also an admissible behavior of Eve since, against a strategy of Adam that plays from v<sup>1</sup> to v <sup>1</sup>, it obtains 10λ<sup>2</sup>(1 <sup>−</sup> <sup>λ</sup>) <sup>&</sup>gt; <sup>6</sup>λ<sup>2</sup>(1 <sup>−</sup> <sup>λ</sup>). -

The two examples above can be used to argue that the sets of strategies that are regret minimal and admissible, respectively, are in fact incomparable.

**Proposition 1.** *There are regret-optimal strategies that are not admissible and admissible strategies that have suboptimal regret.*

*Proof (Sketch).* Consider once more the game depicted in Example 1 and recall that the strategy σ of Eve corresponding to the double edges has minimal regret. This strategy is *not* admissible: it is dominated by the alternative strategy σ of Eve that behaves like σ from v<sup>1</sup> but plays to v <sup>2</sup> from v2. Indeed, if Adam plays to v<sup>1</sup> from v<sup>0</sup> then the outcomes of σ and σ are the same. However, if Adam plays to v<sup>2</sup> then the value of the outcome of σ is 0 while the value of the outcome of σ is strictly greater than 0.

Similarly, the strategy σ depicted by double edges in the game from Example 2 is admissible but *not* regret-minimizing. In fact, her strategy σ that consists in playing v<sup>1</sup> from v<sup>0</sup> has a smaller regret.

In the rest of this section, we show that (1) any strategy is weakly dominated by an admissible strategy; (2) being dominated entails more regret; (3) optipess strategies are both regret-minimal and admissible. We will need the following:

**Lemma 3 ([6]).** *A strategy* σ *of Eve is admissible if and only if for every history* <sup>h</sup> <sup>∈</sup> **Hist**(σ) *the following holds: either* **cVal**<sup>h</sup>(σ) <sup>&</sup>gt; **aVal**<sup>h</sup> *or* **aVal**<sup>h</sup>(σ) = **cVal**<sup>h</sup>(σ) = **aVal**<sup>h</sup> = **acVal**<sup>h</sup>*.*

The above characterization of admissible strategies in so-called *well-formed games* was proved in [6, Theorem 11]. Lemma 3 follows from the fact that discounted-sum games are well-formed.

#### **3.1 Any Strategy Is Weakly Dominated by an Admissible Strategy**

We show that discounted-sum games have the distinctive property that every strategy is weakly dominated by an admissible strategy. This is in stark contrast with most cases where admissibility has been studied previously [6].

#### **Theorem 2.** *Any strategy of Eve is weakly dominated by an admissible strategy.*

*Proof* (*Sketch).* The main idea is to construct, based on σ, a strategy σ that will switch to a SBWO strategy as soon as σ does not satisfy the characterization of Lemma 3. The first part of the argument consists in showing that σ is indeed weakly dominated by σ . This is easily done by comparing, against each strategy τ of Adam, the values of σ and σ . The second part consists in verifying that σ is indeed admissible. This is done by checking that each history h consistent with σ satisfies the characterization of Lemma 3, that is **cVal**<sup>h</sup>(σ ) > **aVal**<sup>h</sup> or **aVal**<sup>h</sup>(σ ) = **cVal**<sup>h</sup>(σ ) = **aVal**<sup>h</sup> <sup>=</sup> **acVal**<sup>h</sup>.

#### **3.2 Being Dominated Is Regretful**

**Theorem 3.** *For all strategies* σ, σ *of Eve such that* σ *is weakly dominated by* σ *, it holds that* **Reg** (σ ) ≤ **Reg** (σ)*.*

*Proof.* Let σ, σ be such that σ is weakly dominated by σ . This means that for every strategy τ of Adam, we have that **Val**(π) ≤ **Val**(π ) where π = **out**<sup>v</sup><sup>0</sup> (σ, τ ) and π = **out**<sup>v</sup><sup>0</sup> (σ , τ ). Consequently: we obtain

$$\left(\sup\_{\sigma^{\prime\prime}} \mathbf{Val}(\mathbf{out}^{v\_0}(\sigma^{\prime\prime}, \tau))\right) - \mathbf{Val}(\pi^{\prime}) \le \left(\sup\_{\sigma^{\prime\prime}} \mathbf{Val}(\mathbf{out}^{v\_0}(\sigma^{\prime\prime}, \tau))\right) - \mathbf{Val}(\pi) \quad \square$$

As this holds for any τ , we can conclude that sup<sup>τ</sup> sup<sup>σ</sup>-- (**Val**(**out**<sup>v</sup><sup>0</sup> (σ, τ )) <sup>−</sup> **Val**(**out**<sup>v</sup><sup>0</sup> (σ , τ ))) ≤ sup<sup>τ</sup> sup<sup>σ</sup>-- (**Val**(**out**<sup>v</sup><sup>0</sup> (σ, τ )) <sup>−</sup> **Val**(**out**<sup>v</sup><sup>0</sup> (σ, τ ))), that is **Reg** (σ ) ≤ **Reg** (σ).

It follows from Proposition 1, however, that the converse of the theorem is false.

#### **3.3 Optipess Strategies Are both Regret-Minimal and Admissible**

Recall that there are admissible strategies that are not regret-minimal and *vice versa* (Proposition 1). However, as a direct consequence of Theorems 2 and 3, there always exist regret-minimal admissible strategies. It turns out that optipess strategies, which are regret-minimal (Theorem 1), are also admissible:

**Theorem 4.** *All optipess strategies of Eve are admissible.*

*Proof.* Let σ = σsbo <sup>t</sup> <sup>→</sup>σsbwo be an optipess strategy; we show it is admissible. To this end, let h = v<sup>0</sup> ...v<sup>n</sup> ∈ **Hist**(σ); we show that one of the properties of Lemma 3 holds. There are two cases:

*(* h *is switched.)* In that case, σ<sup>h</sup> = σsbwo. Since σsbwo is an SBWO strategy, **cVal**<sup>h</sup>(σsbwo) = **acVal**<sup>h</sup>. Now if **acVal**<sup>h</sup> > **aVal**<sup>h</sup>, then:

$$\mathbf{c}\mathbf{V}\mathbf{a}^h(\sigma) = \mathbf{c}\mathbf{V}\mathbf{a}^h(\sigma^{\text{subwo}}) = \mathbf{a}\mathbf{c}\mathbf{V}\mathbf{a}^h > \mathbf{a}\mathbf{V}\mathbf{a}^h\ \ ,$$

and σ satisfies the first property of Lemma 3. Otherwise **acVal**<sup>h</sup> = **aVal**<sup>h</sup> and the second property holds: we have that **cVal**<sup>h</sup>(σ) = **acVal**<sup>h</sup>, and as σsbwo is an SWO and **aVal**<sup>h</sup>(σ) = **aVal**<sup>h</sup>(σsbwo), we also have that **aVal**<sup>h</sup>(σ) = **aVal**<sup>h</sup>.

*(* h *is unswitched.)* We show that **cVal**h(σ) > **aVal**h. Since h is unswitched, we have in particular that:

$$\mathbf{Reg}\left(\sigma\right) = \mathbf{Reg} < \lambda^n \left(\mathbf{cVal}^{v\_n} - \mathbf{aVal}^{v\_n}\right) \ . \tag{1}$$

Furthermore:

$$\begin{split} \lambda^n \left( \mathbf{c} \mathbf{V} \mathbf{al}^{v\_n} - \mathbf{a} \mathbf{V} \mathbf{al}^{v\_n} \right) &= \left( \mathbf{Val}(h) + \lambda^n \mathbf{c} \mathbf{Val}^{v\_n} \right) - \left( \mathbf{Val}(h) + \lambda^n \mathbf{a} \mathbf{Val}^{v\_n} \right) \\ &= \mathbf{c} \mathbf{Val}^h - \mathbf{a} \mathbf{Val}^h \end{split}$$

and combining the previous equation with Eq. 1, we obtain:

$$\mathbf{c} \mathbf{V} \mathbf{a}^h - \mathbf{R} \mathbf{e} \mathbf{g} \left( \sigma \right) > \mathbf{a} \mathbf{V} \mathbf{a}^h \ . $$

To conclude, we show that **Reg** (σ) <sup>≥</sup> **cVal**<sup>h</sup> <sup>−</sup> **cVal**<sup>h</sup>(σ). Consider a strategy τ of Adam such that h is consistent with both σsbo and τ and satisfying **Val**(**out**<sup>v</sup><sup>0</sup> (σsbo, τ )) = **cVal**<sup>h</sup>. (That such a τ exists is intuitively clear since σ has been following the SBO strategy σsbo along h.) It holds immediately that **cVal**<sup>h</sup>(σ) <sup>≥</sup> **Val**(**out**<sup>v</sup><sup>0</sup> (σ, τ )). Now by definition of the regret:

$$\begin{split} \mathbf{Reg}\left(\sigma\right) &\geq \mathbf{Val}(\mathbf{out}^{v\_{0}}(\sigma^{\text{sbo}},\tau)) - \mathbf{Val}(\mathbf{out}^{v\_{0}}(\sigma,\tau)) \\ &\geq \mathbf{cVal}^{h} - \mathbf{cVal}^{h}(\sigma) \ . \end{split}$$

### **4 Minimal Values Are Witnessed by a Single Iterated Cycle**

We start our technical work towards a better algorithm to compute the regret value of a game. Here, we show that there are succinctly presentable histories that witness small values in the game. Our intention is to later use this result to apply a modified version of Lemma 2 to bipositional strategies to argue there are small witnesses of a strategy having too much regret.

More specifically, we show that for any history h, there is another history h of the same length that has smaller value and such that <sup>h</sup> <sup>=</sup> <sup>α</sup> · <sup>β</sup><sup>k</sup> · <sup>γ</sup> where |αβγ| is small. This will allow us to find the smallest possible value among exponentially long histories by guessing α, β, γ, and k, which will all be small. This property holds for a wealth of different valuation functions, hinting at possible further applications. For discounted-sum games, the following suffices to prove the desired property holds.

**Lemma 4.** *For any history* h = α · β · γ *with* α *and* γ *same-length cycles:*

min{**Val**(α<sup>2</sup> · <sup>β</sup>), **Val**(<sup>β</sup> · <sup>γ</sup><sup>2</sup>)} ≤ **Val**(h) .

Within the proof of the key lemma of this section, and later on when we use it (Lemma 9), we will rely on the following notion of cycle decomposition:

**Definition 2.** *A* simple-cycle decomposition *(SCD) is a pair consisting of paths and iterated simple cycles. Formally, an SCD is a pair* <sup>D</sup> <sup>=</sup> (αi)<sup>n</sup> <sup>i</sup>=0,(β<sup>j</sup> , k<sup>j</sup> )<sup>n</sup> <sup>j</sup>=1*, where each* α<sup>i</sup> *is a path, each* β<sup>j</sup> *is a simple cycle, and each* k<sup>j</sup> *is a positive integer. We write* D(j) = βk*<sup>j</sup>* <sup>j</sup> · α<sup>j</sup> *and* D( ) = α<sup>0</sup> · D(1)D(2)··· D(n)*.*

By carefully iterating Lemma 4, we have:

**Lemma 5.** *For any history* <sup>h</sup> *there exists an history* <sup>h</sup> <sup>=</sup> <sup>α</sup> · <sup>β</sup><sup>k</sup> · <sup>γ</sup> *with:*


*Proof.* In this proof, we focus on SCDs for which each path α<sup>i</sup> is simple; we call them ßCDs. We define a wellfounded partial order on ßCDs. Let <sup>D</sup> <sup>=</sup> (αi)<sup>n</sup> <sup>i</sup>=0,(β<sup>j</sup> , k<sup>j</sup> )<sup>n</sup> <sup>j</sup>=1 and D = (α i)<sup>n</sup>- i=0,(β <sup>j</sup> , k <sup>j</sup> )<sup>n</sup>- <sup>j</sup>=1 be two ßCDs; we write D < D iff all the following holds:


That this order has no infinite descending chain is clear. We show two claims:


Together they imply that for a smallest ßCD D, D( ) is of the required form. Indeed let j be the unique value for which k<sup>j</sup> > |V |, then the statement of the Lemma is satisfied by letting α = α<sup>0</sup> · D(1)··· D(j − 1), β = β<sup>j</sup> , k = k<sup>j</sup> , and γ = α<sup>j</sup> · D(j + 1)··· D(n).

*Claim 1.* Suppose D has n > |V |. Since all cycles are simple, there are two cycles β<sup>j</sup> , β<sup>j</sup>- , j<j , of same length. We can apply Lemma 4 on the path β<sup>j</sup> · (αjD(j + 1)··· D(j − 1)) · β<sup>j</sup>- , and remove one of the two cycles while duplicating the other; we thus obtain a similar path of smaller value. This can be done repeatedly until we obtain a path with only one of the two cycles, say β<sup>j</sup>-, the other case being similar. Substituting this path in D( ) results in:

$$\alpha\_0 \cdot D(1) \cdots D(j) \cdot \left(\alpha\_j \cdot D(j+1) \cdot \cdots D(j'-1) \cdot \beta\_{j'}^{k\_j+k\_{j'}}\right) \cdot \alpha\_{j'} \cdot D(j'+1) \cdot \cdots D(n) \; . \; .$$

This gives rise to a smaller ßCD as follows. If α<sup>j</sup>−<sup>1</sup>α<sup>j</sup> is still a simple path, then the above history is expressible as an ßCD with a smaller number of cycles. Otherwise, we rewrite α<sup>j</sup>−<sup>1</sup>α<sup>j</sup> = α <sup>j</sup>−<sup>1</sup>β jα <sup>j</sup> where α <sup>j</sup>−<sup>1</sup> and α <sup>j</sup> are simple paths and β <sup>j</sup> is a simple cycle; since |α <sup>j</sup>−<sup>1</sup>α <sup>j</sup> | < |α<sup>j</sup>−<sup>1</sup>α<sup>j</sup> |, the resulting ßCD is smaller.

*Claim 2.* Suppose D has two k<sup>j</sup> , kj- > |V |, j<j . Since each cycle in the ßCD is simple, k<sup>j</sup> and kj are greater than both |β<sup>j</sup> | and |βj- |; let us write k<sup>j</sup> = b|βj- | + r with 0 ≤ r < |βj- |, and similarly, kj- = b |β<sup>j</sup> | + r . We have:

$$D(j)\cdots D(j') = \beta\_j^r \cdot \left( (\beta\_j^{|\beta\_{j'}|})^b \cdot \alpha\_j \cdot D(j+1) \cdot \cdots D(j'-1) \cdot (\beta\_{j'}^{|\beta\_j|})^{b'} \right) \cdot \beta\_{j'}^{r'} \cdot \alpha\_{j'} \cdot \cdots$$

Noting that β|β*<sup>j</sup>* <sup>|</sup> j and β |β*j*- | <sup>j</sup> are cycles of the same length, we can transfer all the occurrences of one to the other, as in Claim 1. Similarly, if two simple paths get merged and give rise to a cycle, a smaller ßCD can be constructed; if not, then there are now at most r < |V | occurrences of β<sup>j</sup>- (or conversely, r of β<sup>j</sup> ), again resulting in a smaller ßCD.

### **5 Short Witnesses for Regret, Antagonistic, and Collaborative Values**

We continue our technical work towards our algorithm for computing the regret value. In this section, the overarching theme is that of *short witnesses*. We show that (1) the regret value of a strategy is witnessed by histories of bounded length; (2) the collaborative value of a game is witnessed by a simple path and an iterated cycle; (3) the antagonistic value of a strategy is witnessed by an SCD and an iterated cycle.

#### **5.1 Regret Is Witnessed by Histories of Bounded Length**

**Lemma 6.** *Let* σ = σ<sup>1</sup> t →σ<sup>2</sup> *be an arbitrary bipositional switching strategy of Eve and let* C = 2|V | + max{t(v) < ∞}*. We have that:*

$$\begin{aligned} \mathbf{Reg}\left(\sigma\right) = \max \left\{ \lambda^n \left( \mathbf{cVal}\_{\neg \sigma(h)}^{v\_n} - \mathbf{aVal}^{v\_n}(\sigma\_h) \right) \right\} \\ h = v\_0 \dots v\_n \in \mathbf{Hist}(\sigma), n \le C \right\} \dots \end{aligned}$$

*Proof.* Consider a history h of length greater than C, and write h = h<sup>1</sup> · h<sup>2</sup> with |h1| = max{t(v) < ∞}. Let h<sup>2</sup> = p · p where p is the maximal prefix of h<sup>2</sup> such that h<sup>1</sup> · p is unswitched—we set p = if h is switched. Note that one of p or p is longer than |V |—say p, the other case being similar. This implies that there is a cycle in p, i.e., p = α · β · γ with β a cycle. Let h = h<sup>1</sup> · α · γ · p ; this history has the same starting and ending vertex as h. Moreover, since |h1| is larger than any value of the threshold function, σ<sup>h</sup> = σ<sup>h</sup>- . Lastly, h is still in **Hist**(σ), since the removed cycle did not play a role in switching strategy. This shows:

$$\mathbf{c}\mathbf{V}\mathbf{al}^{v\_n}\_{\neg\sigma(h)} - \mathbf{a}\mathbf{V}\mathbf{al}^{v\_n}(\sigma\_h) = \mathbf{c}\mathbf{V}\mathbf{al}^{v\_n}\_{\neg\sigma(h')} - \mathbf{a}\mathbf{V}\mathbf{al}^{v\_n}(\sigma\_{h'}) \ .$$

Since the length of h is greater than the length of h , the discounted value for h will be greater than that of h, resulting in a higher regret value. There is thus no need to consider histories of size greater than C.

It may seem from this lemma and the fact that t(v) may be very large that we will need to guess histories of important length. However, since we will be considering bipositional switching strategies, we will only be interested in guessing *some* properties of the histories that are not hard to verify:

**Lemma 7.** *The following problem is decidable in* NP*: Given: A game, a bipositional switching strategy* σ*, a number* n *in binary, a Boolean* b*, and two vertices* v, v *Question: Is there a* h ∈ **Hist**(σ) *of length* n*, switched if* b*, ending in* v*, with* σ(h) = v *?*

*Proof.* This is done by guessing multiple flows within the graph (V,E). Here, we call *flow* a valuation of the edges E by integers, that describes the number of times a path crosses each edge. Given a vector in N<sup>E</sup>, it is not hard to check whether there is a path that it represents, and to extract the initial and final vertices of that path [17].

We first order the different thresholds from the strategy σ = σ<sup>1</sup> t →σ2: let V<sup>∃</sup> = {v1, v2,...,vk} with t(vi) ≤ t(vi+1) for all i. We analyze the structure of histories consistent with σ. Let h ∈ **Hist**(σ), and write h = h · h where h is the maximal unswitched prefix of h. Naturally, h is consistent with σ<sup>1</sup> and h is consistent with σ2. Then h = h0h<sup>1</sup> ··· hi, for some i < |V∃|, with:


To confirm the existence of a history with the given parameters, it is thus sufficient to guess the value i ≤ |V∃|, and to guess i connected flows (rather than paths) with the above properties that are consistent with σ1. Finally, we guess a flow for h consistent with σ<sup>2</sup> if we need a switched history, and verify that it is starting at a switching vertex. The flows must sum to n + 1, with the last vertex being v , and the previous v.

#### **5.2 Short Witnesses for the Collaborative and Antagonistic Values**

**Lemma 8.** *There is a set* P *of pairs* (α, β) *with* α *a simple path and* β *a simple cycle such that:*

*–* **cVal**<sup>v</sup><sup>0</sup> = max{**Val**(<sup>α</sup> · <sup>β</sup><sup>ω</sup>) <sup>|</sup> (α, β) <sup>∈</sup> <sup>P</sup>} *and*

*– membership in* P *is decidable in polynomial time w.r.t. the game.*

*Proof.* We argue that the set P of all pairs (α, β) with α a simple path, β a simple cycle, and such that α · β is a path, gives us the result.

The first part of the claim is a consequence of Lemma 1: Consider positional SBO strategies τ and σ of Adam and Eve, respectively. Since they are positional, the path **out**<sup>v</sup><sup>0</sup> (σ, τ ) is of the form <sup>α</sup> · <sup>β</sup><sup>ω</sup>, as required, and its value is **cVal**<sup>v</sup><sup>0</sup> . We can thus let P be the set of all pairs obtained from such SBO strategies.

Moreover, it can be easily checked that for all pairs (α, β) such that α· β is a path in the game there exists a pair of strategies with outcome <sup>α</sup>·βω. (Note that verifying whether α · β is a path can indeed be done in polynomial time given α and <sup>β</sup>.) Finally, the value **Val**(<sup>α</sup> · <sup>β</sup>ω) will, by definition, be at most **cVal**v<sup>0</sup> .

**Lemma 9.** *Let* σ *be a bipositional switching strategy of Eve. There is a set* K *of pairs* (D, β) *with* D *an SCD and* β *a simple cycle such that:*


*Proof.* We will prove that the set K of all pairs (D, β) with D an SCD of polynomial length (which will be specified below), β a simple cycle, and such that D( ) · β is a path, satisfies our claims.

Let C = max{t(v) < ∞}, and consider a play π consistent with σ that achieves the value **aVal**<sup>v</sup><sup>0</sup> (σ). Write <sup>π</sup> <sup>=</sup> <sup>h</sup> · <sup>π</sup> with <sup>|</sup>h<sup>|</sup> <sup>=</sup> <sup>C</sup>, and let <sup>v</sup> be the final vertex of h. Naturally:

$$\mathbf{a}\mathbf{V}\mathbf{a}^{v\_0}(\sigma) = \mathbf{V}\mathbf{a}(\pi) = \mathbf{V}\mathbf{a}(h) + \lambda^{|h|}\mathbf{V}\mathbf{a}(\pi')\ .$$

We first show how to replace <sup>π</sup> by some <sup>α</sup> · <sup>β</sup><sup>ω</sup>, with <sup>α</sup> a simple path and β a simple cycle. First, since π witnesses **aVal**<sup>v</sup><sup>0</sup> (σ), we have that **Val**(π ) = **aVal**<sup>v</sup>(σh). Now <sup>σ</sup><sup>h</sup> is positional, because <sup>|</sup>h| ≥ <sup>C</sup>. <sup>1</sup> It is known that there are optimal positional antagonistic strategies τ for Adam, that is, that satisfy **aVal**<sup>v</sup>(σh) = **out**<sup>v</sup>(σh, τ ). As in the proof of Lemma 8, this implies that **aVal**<sup>v</sup>(σh) = **Val**(<sup>α</sup> · <sup>β</sup><sup>ω</sup>) = **Val**(π ) for some α and β; additionally, any (α, β) that are consistent with σ<sup>h</sup> and a potential strategy for Adam will give rise to a larger value.

We now argue that **Val**(h) is witnessed by an SCD of polynomial size. This bears similarity to the proof of Lemma 7. Specifically, we will reuse the fact that histories consistent with σ can be split into histories played "between thresholds."

Let us write σ = σ<sup>1</sup> t →σ2. Again, we let V<sup>∃</sup> = {v1, v2,...,vk} with t(vi) ≤ t(vi+1) for all i and write h = h · h where h is the maximal unswitched prefix of h. We note that h is consistent with σ<sup>1</sup> and h is consistent with σ2. Then h = h0h<sup>1</sup> ··· hi, for some i < |V∃|, with:


We now diverge from the proof of Lemma 7. We apply Lemma 5 on each h<sup>j</sup> in the game where the strategy σ<sup>1</sup> is hardcoded (that is, we first remove every edge (u, v) ∈ V<sup>∃</sup> × V that does not satisfy σ1(u) = v). We obtain a history h 0h <sup>1</sup> ··· h <sup>i</sup> that is still in **Hist**(σ), thanks to the previous splitting of h. We also apply Lemma 5 to h , this time in the game where σ<sup>2</sup> is hardcoded, obtaining h. Since each h <sup>j</sup> and <sup>h</sup> are expressed as <sup>α</sup>·β<sup>k</sup> · <sup>γ</sup>, there is an SCD <sup>D</sup> with no more

<sup>1</sup> Technically, σ*<sup>h</sup>* is positional in the game that records whether the switch was made.

than |V∃| elements that satisfies **Val**(D( )) ≤ **Val**(h)—naturally, since **Val**(h) is minimal and D( ) ∈ **Hist**(σ), this means that the two values are equal. Note that it is not hard, given an SCD D, to check whether D( ) ∈ **Hist**(σ), and that SCDs that are not valued **Val**(h) have a larger value.

# **6 The Complexity of Regret**

We are finally equipped to present our algorithms. To account for the cost of numerical analysis, we rely on the problem PosSLP [2]. This problem consists in determining whether an arithmetic circuit with addition, subtraction, and multiplication gates, together with input values, evaluates to a positive integer. PosSLP is known to be decidable in the so-called counting hierarchy, itself contained in the set of problems decidable using polynomial space.

**Theorem 5.** *The following problem is decidable in* NPPosSLP*: Given: A game, a bipositional switching strategy* σ*, a value* <sup>r</sup> <sup>∈</sup> <sup>Q</sup> *in binary Question: Is* **Reg** (σ) > r*?*

*Proof.* Let us write σ = σ<sup>1</sup> t →σ2. Lemma 6 indicates that **Reg** (σ) > r holds if there is a history h of some length n ≤ C = 2|V | + max{t(v) < ∞}, ending in some v<sup>n</sup> such that:

$$
\lambda^n \left( \mathbf{c} \mathbf{V} \mathbf{al}\_{\neg \sigma(h)}^{v\_n} - \mathbf{a} \mathbf{V} \mathbf{al}^{v\_n} (\sigma\_h) \right) > r \; . \tag{2}
$$

Note that since σ is bipositional, we do not need to know everything about h. Indeed, the following properties suffice: its length n, final vertex vn, v = σ(h), and whether it is switched. Rather than guessing h, we can thus rely on Lemma 7 to get the required information. We start by simulating the NP machine that this lemma provides, and verify that n, vn, and v are consistent with a potential history.

Let us now concentrate on the collaborative value that we need to evaluate in Eq. 2. To compute **cVal**, we rely on Lemma 8, which we apply in the game where v<sup>n</sup> is set initial, and its successor forced not to be v. We guess a pair (αc, βc) <sup>∈</sup> <sup>P</sup>; we thus have **Val**(α<sup>c</sup> · <sup>β</sup><sup>ω</sup> <sup>c</sup> ) <sup>≤</sup> **cVal**<sup>v</sup>*<sup>n</sup>* ¬σ(h), with at least one guessed pair (αc, βc) reaching that latter value.

Let us now focus on computing **aVal**<sup>v</sup>*<sup>n</sup>* (σh). Since σ is a bipositional switching strategy, σ<sup>h</sup> is simply σ where t(v) is changed to max{0, t(v) − n}. Lemma 9 can thus be used to compute our value. To do so, we guess a pair (D, βa) ∈ K; we thus have **Val**(D( )·β<sup>ω</sup> <sup>a</sup> ) <sup>≥</sup> **aVal**<sup>v</sup>*<sup>n</sup>* (σh), and at least one pair (D, βa) reaches that latter value.

Our guesses satisfy:

$$\mathbf{c}\mathbf{V}\mathbf{a}\mathbf{l}^{v\_n}\_{\neg\sigma(h)} - \mathbf{a}\mathbf{V}\mathbf{a}\mathbf{l}^{v\_n}(\sigma\_h) \ge \mathbf{V}\mathbf{a}(\alpha\_\mathbf{c} \cdot \beta\_\mathbf{c}^\omega) - \mathbf{V}\mathbf{a}(D(\star) \cdot \beta\_\mathbf{a}^\omega) \ \ ,$$

and there is a choice of our guessed paths and SCD that gives exactly the lefthand side. Comparing the left-hand side with r can be done using an oracle to PosSLP, concluding the proof.

**Theorem 6.** *The following problem is decidable in* coNPNPPosSLP *: Given: A game, a value* <sup>r</sup> <sup>∈</sup> <sup>Q</sup> *in binary Question: Is* **Reg** > r*?*

*Proof.* To decide the problem at hand, we ought to check that *every* strategy has a regret value greater than r. However, optipess strategies being regret-minimal, we need only check this for a class of strategies that contains optipess strategies: bipositional switching strategies form one such class.

What is left to show is that optipess strategies can be encoded in *polynomial space*. Naturally, the two positional strategies contained in an optipess strategy can be encoded succinctly. We thus only need to show that, with t as in the definition of optipess strategies (page 5), t(v) is at most exponential for every <sup>v</sup> <sup>∈</sup> <sup>V</sup><sup>∃</sup> with <sup>t</sup>(v) <sup>∈</sup> <sup>N</sup>. This is shown in the long version of this paper.

**Theorem 7.** *The following problem is decidable in* coNPNPPosSLP *: Given: A game, a bipositional switching strategy* σ *Question: Is* σ *regret optimal?*

*Proof.* A consequence of the proof of Theorem 5 and the existence of optipess strategies is that the value **Reg** of a game can be computed by a polynomial size arithmetic circuit. Moreover, our reliance on PosSLP allows the input r in Theorem 5 to be represented as an arithmetic circuit without impacting the complexity. We can thus verify that for all bipositional switching strategies σ (with sufficiently large threshold functions) and all possible polynomial size arithmetic circuits, **Reg**(σ) > r implies that **Reg**(σ ) > r. The latter holds if and only if σ is regret optimal since, as we have argued in the proof of Theorem 6, such strategies σ include optipess strategies and thus regret-minimal strategies.

### **7 Conclusion**

We studied *regret*, a notion of interest for an agent that does not want to assume that the environment she plays in is simply adversarial. We showed that there are strategies that both minimize regret, and are not consistently worse than any other strategies. The problem of computing the minimum regret value of a game was then explored, and a better algorithm was provided for it.

The exact complexity of this problem remains however open. The only known lower bound, a straightforward adaptation of [14, Lemma 3] for discounted-sum games, shows that it is at least as hard as solving parity games [15].

Our upper bound could be significantly improved if we could efficiently solve the following problem:

### **PosRatBase**

**Given:** (ai)<sup>n</sup> <sup>i</sup>=1 <sup>∈</sup> <sup>Z</sup>n, (bi)<sup>n</sup> <sup>i</sup>=1 <sup>∈</sup> <sup>N</sup>n, and <sup>r</sup> <sup>∈</sup> <sup>Q</sup> all in binary, **Question:** Is <sup>n</sup> <sup>i</sup>=1 <sup>a</sup><sup>i</sup> · <sup>r</sup>b*<sup>i</sup>* <sup>&</sup>gt; 0?

This can be seen as the problem of comparing succinctly represented numbers in a rational base. The PosSLP oracle in Theorem 5 can be replaced by an oracle for this seemingly simpler arithmetic problem. The variant of PosRatBase in which r is an integer was shown to be in P by Cucker, Koiran, and Smale [8], and they mention that the complexity is open for rational values. To the best of our knowledge, the exact complexity of PosRatBase is open even for n = 3.

**Acknowledgements.** We thank Rapha¨el Berthon and Isma¨el Jecker for helpful conversations on the length of maximal (and minimal) histories in discounted-sum games, James Worrell and Jo¨el Ouaknine for pointers on the complexity of comparing succinctly represented integers, and George Kenison for his writing help.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Causality in Linear Logic Full Completeness and Injectivity (Unit-Free Multiplicative-Additive Fragment)**

Simon Castellan(B) and Nobuko Yoshida

Imperial College London, London, UK simon@phis.me

**Abstract.** Commuting conversions of Linear Logic induce a notion of dependency between rules inside a proof derivation: a rule depends on a previous rule when they cannot be permuted using the conversions. We propose a new interpretation of proofs of Linear Logic as *causal invariants* which captures *exactly* this dependency. We represent causal invariants using game semantics based on general event structures, carving out, inside the model of [6], a submodel of causal invariants. This submodel supports an interpretation of unit-free Multiplicative Additive Linear Logic with MIX (MALL−) which is (1) *fully complete*: every element of the model is the denotation of a proof and (2) *injective*: equality in the model characterises exactly commuting conversions of MALL−. This improves over the standard fully complete game semantics model of MALL−.

**Keywords:** Event structures · Linear Logic · Proof nets · Game semantics

## **1 Introduction**

*Proofs up to commuting conversions.* In the sequent calculus of Linear Logic, the order between rules need not always matter: allowed reorderings are expressed by *commuting conversions*. These conversions are necessary for confluence of cut-elimination by mitigating the sequentiality of the sequent calculus. The real proof object is often seen as an equivalence class of proofs modulo commuting conversions. The problem of providing a canonical representation of proofs up to those commuting conversions is as old as Linear Logic itself, and proves to be a challenging problem. The traditional solution interprets a proof by a graphical representation called *proof net* and dates back to Girard [17]. Girard's solution is only satisfactory in the multiplicative-exponential fragment of Linear Logic. For additives, a well-known solution is due to Hughes and van Glabbeck [22], where proofs are reduced to their set of axiom linkings. However, the correctness criterion relies on the difficult *toggling* condition.

Proof nets tend to be based on specific representations such as graphs or sets of linkings. Denotational semantics has not managed to provide a semantic counterpart to proof nets, which would be a model where every element is

**Fig. 1.** Examples of causal invariants

the interpretation of a proof (*full completeness*) and whose equational theory coincides with commuting conversions (*injectivity*). We believe this is because denotational semantics views conversions as extensional principles, hence models proofs with extensional objects (relations, functions) too far from the syntax.

Conversions essentially state that the order between rules applied to different premises does not matter, as evidenced in the two equivalent proofs of the sequent - X<sup>⊥</sup> ⊕X⊥, X ⊕X depicted on the right. These two proofs are equal in extensional models of Linear Logic because *they have the same extensional behaviour*.

Unfortunately, characterising the image of the interpretation proved to be a difficult task in extensional models. The first fully complete models used game semantics, and

$$
\begin{array}{ccc}
\overline{\vdash X^{\perp},X}^{\perp}Ax\\\hline \overline{\vdash X^{\perp},X\oplus X}^{\perp}\oplus\_{\mathbb{I}}\\\hline \overline{\vdash X^{\perp}\oplus X^{\perp},X\oplus X}^{\perp}\end{array}
\oplus\_{\mathbb{I}}\quad\begin{array}{c}
\overline{\vdash X^{\perp},X}^{\perp}Ax\\\hline \overline{\vdash X^{\perp}\oplus X^{\perp},X}^{\perp}\oplus\_{\mathbb{I}}\\\hline \overline{\vdash X^{\perp}\oplus X^{\perp},X\oplus X}^{\perp}\oplus\_{\mathbb{I}}
\end{array}
$$

are due to Abramsky and Melli`es (MALL) [1] and Melli`es (Full LL) [24]. However, their models use an *extensional* quotient on strategies to satisfy the conversions, blurring the concrete nature of strategies.

*The true concurrency of conversions.* Recent work [5] highlights an interpretation of Linear Logic as communicating processes. Rules become actions whose polarity (input or output) is tied to the polarity of the connective (negative or positive), and cut-elimination becomes communication. In this interpretation, each assumption in the context is assigned a channel on which the proof communicates. Interestingly, commuting conversions can be read as asynchronous permutations. For instance, the conversion mentioned above becomes the equation in the syntax of Wadler [27]:

$$(1)\quad u[\mathtt{in1}].v[\mathtt{in1}].[u\hookrightarrow v] \equiv v[\mathtt{in1}].u[\mathtt{in1}].[u\hookrightarrow v] \not\simeq u: X^\perp \oplus X^\perp, v: X\oplus X,$$

where u[inl] corresponds to a ⊕1-introduction rule on (the assumption corresponding to) u, and [u ↔ v] is the counterpart to an axiom between the hypothesis corresponding to u and v. It becomes then natural to consider that the canonical object representing these two proofs should be a concurrent process issuing the two outputs in parallel. A notion of causality emerges from this interpretation, where a rule *depends on* a previous rule below in the tree when these two rules cannot be permuted using the commuting conversions. This leads us to causal models to make this dependency explicit. For instance, the two processes in (1) can be represented as the partial order depicted in Fig. 1a, where dependency between rules is marked with -.

In presence of &, a derivation stands for several execution (slices), given by different premises of a &-rule (whose process equivalent is u.case (P, Q) and represents pattern matching on an incoming message). The identity on X ⊕ Y , corresponding to the proof

$$u.\mathsf{cases}\left(v[\mathsf{in1}].\left[u\leftrightarrow v\right],\ v[\mathsf{inr}].\left[u\leftrightarrow v\right]\right)\rhd u:X^\perp\&\ Y^\perp,v:X\oplus Y,$$

is interpreted by the *event structure* depicted in Fig. 1b. Event structures [28] combine a partial order, representing causality, with a conflict relation representing when two events cannot belong to the same execution (here, same slice). Conflict here is indicating with and separates the slices. The &-introduction becomes two conflicting events.

**Fig. 2.** Representations of or

*Conjunctive and disjunctive causalities.* Consider the process on the context u : (X ⊕ X)⊥, v : (Y ⊕ Y )⊥, w : (X ⊗ Y ) ⊕ (X ⊗ Y ) implementing disjunction:

$$\mathbf{for} = u.\mathbf{case}\begin{pmatrix} v.\mathbf{case}\ (w[\mathtt{in1}].P, w[\mathtt{in1}].P),\\ v.\mathbf{case}\ (w[\mathtt{in1}].P, w[\mathtt{in1}].P) \end{pmatrix}\text{ where }P = w[x].\newline([u \leftrightarrow w] \mid [v \leftrightarrow x]).$$

Cuts of or against a proof starting with u[inl] or v[inl] answer on w after reduction:

$$(\nu u)(\text{or} \mid u[\text{in1}]) \to^\* w[\text{in1}].v.\mathsf{case}\,(P,P) \quad (\nu v)(\text{or} \mid v[\text{in1}]) \to^\* w[\text{in1}].u.\mathsf{case}\,(P,P)$$

where (νu)(P | Q) is the process counterpart to logical cuts. This operational behaviour is related to *parallel or*, evaluating its arguments in parallel and returning true as soon as one returns true. Due to this intentional behaviour, the interpretation of or in prime event structures is nondeterministic (Fig. 2a), as causality in event structures is *conjunctive* (an event may only occur after all its predecessors have occurred). By moving to *general* event structures, however, we can make the disjunctive causality explicit and recover determinism (Fig. 2b).

*Contributions and outline.* Drawing inspiration from the interpretation of proofs in terms of processes, we build a fully complete and injective model of unit-free Multiplicative Additive Linear Logic with MIX (MALL−), interpreting proofs as general event structures living in a submodel of the model introduced by [6]. Moreover, our model captures the dependency between rules, which makes sequentialisation a local operation, unlike in proof nets, and has a more uniform acyclicity condition than [22].

We first recall the syntax of MALL<sup>−</sup> and its reading in terms of processes in Sect. 2. Then, in Sect. 3, we present a slight variation on the model of [6], where we call the (pre)strategies *causal structures*, by analogy with proof structures. Each proof tree can be seen as a (sequential) causal structure. However, the space of causal structures is too broad and there are many causal structures which do not correspond to any proofs. A major obstacle to sequentialisation is the presence of *deadlocks*. In Sect. 4, we introduce a condition on causal structures, ensuring deadlock-free composition, inspired by the interaction between ` and ⊗ in Linear Logic. Acyclic causal structures are still allowed to only explore partially the game, contrary to proofs which must explore it exhaustively, hence in Sect. 5, we introduce further conditions on causal structures, ensuring a *strong* sequentialisation theorem (Theorem 2): we call them *causal nets*. In Sect. 6, we define causal invariants as maximal causal nets. Every causal net embeds in a *unique* causal invariant; and a particular proof P embeds inside a unique causal invariant which forms its denotation P. Moreover, two proofs embed in the same causal invariant if and only if they are convertible (Theorem 4). Finally, we show how to equip causal invariants with the structure of ∗-autonomous category with products and deduce that they form a fully complete model of MALL<sup>−</sup> (Theorem 6) for which the interpretation is injective.

The proofs are available in the technical report [7].

### **2 MALL***<sup>−</sup>* **and Its Commuting Conversions**

In this section, we introduce MALL<sup>−</sup> formulas and proofs as well as the standard commuting conversions and cut elimination for this logic. As mentioned in the introduction, we use a process-like presentation of proofs following [27]. This highlights the communicating aspect of proofs which is an essential intuition for the model; and it offers a concise visualisation of proofs and conversions.

*Formulas.* We define the formulas of MALL−: T,S ::= X | X<sup>⊥</sup> | T ⊗ S | T ` S | T ⊕S | T &S, where X and X<sup>⊥</sup> are *atomic formulas* (or *ltterals*) belonging to a set <sup>A</sup>. Formulas come with the standard notion of duality (·)<sup>⊥</sup> given by the De Morgan rules: ⊗ is dual to `, and ⊕ to &. An *environment* is a partial mapping of *names* to formulas, instead of a multiset of formulas – names disambiguate which assumption a rule acts on.

*Proofs as processes.* We see proofs of MALL<sup>−</sup> (with MIX) as typing derivations for a variant of the π-calculus [27]. The (untyped) syntax for the processes is as follows:

$$\begin{array}{l} P,Q ::= u(v).P \mid u[v].(P \mid Q) \qquad \text{(multiplicative)}\\ \mid u.\mathsf{case}(P,Q) \mid u[\mathsf{in1}].P \mid u[\mathsf{inr}].P \quad \text{(additives)}\\ \mid u \gets v \mid (\nu u)(P \mid Q) \mid (P \mid Q) \qquad \text{(logical and mix)} \end{array}$$

u(v).P denotes an input of v on channel u (used in `-introduction) while u[v].(P | Q) denotes output of a fresh channel v along channel u (used in ⊗ introduction); The term [u ↔ v] is a *link*, forwarding messages received on u to v, corresponds to axioms, and conversely; and (νu)(P | Q) represents a restriction of u in P and Q and corresponds to cuts; u.case (P, Q) is an input branching representing &-introductions, which interacts with selection, either u[inl]. R or u[inr]. R; in (νu)(P | Q), u is bound in both P and Q, in u(v). P, v is bound in P, and in u[v].(P | Q), v is only bound in Q.

We now define MALL<sup>−</sup> proofs as typing derivations for processes. The inference rules, recalled in Fig. 3, are from [27]. The links (axioms) are restricted to literals – for composite types, one can use the usual η-expansion laws. There is a straightforward bijection between standard (η-expanded) proofs of MALL<sup>−</sup> and typing derivations.


**Fig. 3.** Typing rules for MALL<sup>−</sup> (above) and contexts (below)

*Commutation rules and cut elimination.* We now explain the valid commutations rules in our calculus. We consider contexts C [[]1,..., []n] with several holes to accomodate & which has two branches. Contexts are defined in Fig. 3, and are assigned a type . It intuitively means that if we plug proofs of Γ<sup>i</sup> in the holes, we get back a proof of . We use the notation C[Pi]<sup>i</sup> for C[P1,...,Pn] when (Pi) is a family of processes. Commuting conversion is the smallest congruence ≡ satisfying all well-typed instances of the rule C[D[Pi,j ]<sup>j</sup> ]<sup>i</sup> ≡ D[C[Pi,j ]i]<sup>j</sup> for C and D two contexts. For instance a[inl]. b.case (P, Q) ≡ b.case (a[inl]. P, a[inl]. Q). Figure 4 gives reduction rules P → Q. The first four rules are the *principal* cut rules and describe the interaction of two dual terms, while the last one allows cuts to move inside contexts.

# **3 Concurrent Games Based on General Event Structures**

This section introduces a slight variation on the model of [6]. In Sect. 3.1, we define *games* as prime event structures with polarities, which are used to interpret formulas. We then introduce general event structures in Sect. 3.2, which are used to define causal structures.

**Fig. 4.** Cut elimination in MALL<sup>−</sup>

#### **3.1 Games as Prime Event Structures with Polarities**

*Definition of games.* Prime event structures [28] (simply event structures in the rest of the paper) are a causal model of nondeterministic and concurrent computation. We use here prime event structures *with binary conflict*. An **event structure** is a triple (E, ≤<sup>E</sup>, #E) where (E, ≤<sup>E</sup>) is a partial order and #<sup>E</sup> is an irreflexive symmetric relation (representing **conflict**) satisfying: (1) if e ∈ E, then [e] := {e ∈ E | e ≤<sup>E</sup> e} is finite; and (2) if e #<sup>E</sup> e and e ≤<sup>E</sup> e then e #<sup>E</sup> e . We often omit the E subscripts when clear from the context.

A **configuration** of E is a downclosed subset of E which does not contain two conflicting events. We write *C* (E) for the set of *finite* configurations of E. For any <sup>e</sup> <sup>∈</sup> <sup>E</sup>, [e] is a configuration, and so is [e) := [e] \ {e}. We write <sup>e</sup> e for the immediate causal relation of E defined as e<e with no event between. Similarly, a conflict e#e is **minimal**, denoted , when the [e] ∪ [e ) and [e) ∪ [e ] are configurations. When drawing event structures, only and are represented. We write max(E) for the set of maximal events of E for ≤<sup>E</sup>. An event e is maximal in x when it has no successor for ≤<sup>E</sup> in x. We write max<sup>E</sup> x for the maximal events of a configuration x ∈ *C* (E).

An event structure E is **confusion-free** when (1) for all then [e) = [e ) and (2) if and then e = e or . As a result, the relation is an equivalence relation whose equivalent classes a are called **cells**.

**Definition 1.** *A game is a confusion-free event structure* A *along with an assignment pol* : A → {−, +} *such that cells contain events of the same polarity, and a function atom*: max(A) <sup>→</sup> <sup>A</sup> *mapping every maximal event of* <sup>A</sup> *to an atom. Events with polarity* − *(resp.* +*) are negative (resp. positive).*

Events of a game are usually called *moves*. The restriction imposes branching to be polarised (*i.e.* belonging to a player). A game is **rooted** when two minimal events are in conflict. Single types are interpreted by rooted games, while contexts are interpreted by arbitrary games. When introducing moves of a game, we will indicate their polarity in exponent, *e.g.* "let <sup>a</sup><sup>+</sup> <sup>∈</sup> <sup>A</sup>" stands for assuming a positive move of A.

*Interpretation of formulas.* To interpret formulas, we make use of standard constructions on prime event structures. The event structure a·E is E prefixed with a, *i.e.* E ∪ {a} where *all* events of E depends on a. The parallel composition of E and E represents parallel executions of E and E without interference:

**Definition 2.** *The parallel composition of event structures* A<sup>0</sup> *and* A<sup>1</sup> *is the event structure* A<sup>0</sup>  A<sup>1</sup> = ({0} × A<sup>0</sup> ∪ {1} × A1, ≤<sup>A</sup>0A<sup>1</sup> , #<sup>A</sup>0A<sup>1</sup> ) *with* (i, a) ≤<sup>A</sup>0A<sup>1</sup> (j, a ) *iff* i = j *and* a ≤<sup>A</sup>*<sup>i</sup>* a *; and* (i, a) #<sup>A</sup>0A<sup>1</sup> (j, a ) *when* i = j *and* a #<sup>A</sup>*<sup>j</sup>* a *.*

The sum of event structure E + F is the nondeterministic analogue of parallel composition.

**Definition 3.** *The sum* A<sup>0</sup> + A<sup>1</sup> *of the two event structures* A<sup>0</sup> *and* A<sup>1</sup> *has the same partial order as* A<sup>0</sup>  A1*, and conflict relation* (i, a) #<sup>A</sup>0+A<sup>1</sup> (j, a ) *iff* i = j *or* i = j *and* a #<sup>A</sup>*<sup>j</sup>* a *.*

Prefixing, parallel composition and sum of event structures extend to games. The dual of a game A, obtained by reversing the polarity labelling, is written A⊥. Given x ∈ *C* (A), we define A/x ("A after x") as the subgame of A comprising the events a ∈ A \ x not in conflict with events in x.

*Interpretation of formulas.* The interpretation of the atom X is the game with a single positive event simply written X with *atom*(X) = X, and the interpretation of <sup>X</sup><sup>⊥</sup> is X⊥, written simply <sup>X</sup><sup>⊥</sup> in diagrams. For composite formulas, we let (where send, inl and inr are simply labels):


Parallel composition is used to interpret contexts: u<sup>1</sup> : <sup>T</sup>1,...,u<sup>n</sup> : <sup>T</sup>n <sup>=</sup> T1  ...  Tn. The interpretation commutes with duality: T<sup>⊥</sup> <sup>=</sup> <sup>T</sup> <sup>⊥</sup>.

In diagrams, we write moves of a context following the syntactic convention: for instance u[inl] denotes the minimal inl move of the u component. For tensors and pars, we use the notation u[v] and u(v) to make explicit the variables we use in the rest of the diagram, instead of send<sup>+</sup> and send<sup>−</sup> respectively. For atoms, we use u : X and u : X⊥.

#### **3.2 Causal Structures as Deterministic General Event Structures**

As we discussed in Sect. 1, prime event structures cannot express disjunctive causalities deterministically, hence fail to account for the determinism of LL. Our notion of causal structure is based on *general event structures*, which allow more complex causal patterns. We use a slight variation on the definition of deterministic general event structures given by [6], to ensure that composition is well-defined without further assumptions.

Instead of using the more concrete representation of general event structures in terms of a set of events and an enabling relation, we use the following formulation in terms of set of configurations, more adequate for mathematical reasoning. Being only sets of configurations, they can be reasoned on with very simple set-theoretic arguments.

**Definition 4.** *A causal structure (abbreviated as causal struct) on a game* A *is a subset* σ ⊆ *C* (A) *containing ∅ and satisfying the following conditions:*


Configurations of prime event structures satisfy a further axiom, *stability*, which ensures the absence of disjunctive causalities. When σ is a causal struct on A, we write <sup>σ</sup> : <sup>A</sup>. We draw as regular event structures, using and . To indicate disjunctive causalities, we annotate joins with **or**. This convention is not powerful enough to draw *all* causal structs, but enough for the examples in this paper. As an example, on A = a  b  c the diagram on the right denotes the following causal struct σ = {x ∈ *C* (A) | c ∈ x ⇒ x ∩ {a, b} = *∅*}.

A **minimal event** of σ : A is an event a ∈ A with {a} ∈ σ. An event a ∈ x ∈ σ is **maximal** in x when x \ {a} ∈ σ. A **prime configuration** of a ∈ A is a configuration x ∈ σ such that a is its unique maximal event. Because of disjunctive causalities, an event a ∈ A can have several distinct prime configurations in

σ (unlike in event structures). In the previous example, since c can be caused by either a or b, it has two prime configurations: {a, c} and {b, c}. We write max σ for the set of **maximal configurations** of σ, ie. those configurations that cannot be further extended.

Even though causality is less clear in general event structures than in prime event structures, we give here a notion of immediate causal dependence that will be central to define acyclic causal structs. Given a causal struct σ : A and <sup>x</sup> <sup>∈</sup> <sup>σ</sup>, we define a relation x,σ on <sup>x</sup> as follows: <sup>a</sup> x,σ a when there exists a prime configuration y of a such that x ∪ y ∈ σ, and that a is maximal in y \ {a }. This notion is compatible with the drawing above: we have <sup>a</sup> -*<sup>∅</sup>* c and <sup>b</sup> -*<sup>∅</sup>* c as c has two prime configurations: {a, c} and {b, c}. Causality needs to be contextual, since different slices can implement different causal patterns. Parallel composition and prefixing structures extend to causal structs:

$$\sigma \parallel \tau = \{ x \parallel y \in \ell^{\ell}(A \parallel B) \mid (x, y) \in \sigma \times \tau \} \qquad \qquad a \cdot \sigma = \{ x \in \ell^{\ell}(a \cdot A) \mid x \cap A \in \sigma \}.$$

*Categorical setting.* Causal structs can be composed using the definitions of [6]. Consider σ : A<sup>⊥</sup>  B and τ : B<sup>⊥</sup>  C. A **synchronised configuration** is a configuration x ∈ *C* (A  B  C) such that x ∩ (A  B) ∈ σ and x ∩ (B  C) ∈ τ . A synchronised configuration x is **reachable** when there exists a sequence (**covering chain**) of synchronised configurations x<sup>0</sup> = *∅* ⊆ x<sup>1</sup> ⊆ ... ⊆ x<sup>n</sup> = x such that xi+1 \x<sup>i</sup> is a singleton. The reachable configurations are used to define the interaction τ σ, and then after hiding, the composition τ σ:

τ σ = {x is a reachable synchronised configuration} τ σ = {x ∩ (A C) | x ∈ τ σ}.

Unlike in [6], our determinism is strong enough for τ σ to be a causal struct.

**Lemma 1.** *If* σ : A<sup>⊥</sup>  B *and* τ : B<sup>⊥</sup>  C *are causal structs then* τ σ *is a causal struct.*

Composition of causal structs will be used to interpret cuts between proofs of Linear Logic. In concurrent game semantics, composition has a natural identity, asynchronous copycat [25], playing on the game A<sup>⊥</sup>  A, forwarding negative moves on one side to the positive occurrence on the other side. Following [6], we define cc<sup>A</sup> = {x  y ∈ *C* (A<sup>⊥</sup>  A) | y ⊇<sup>−</sup> <sup>A</sup> <sup>x</sup> <sup>∩</sup> <sup>y</sup> <sup>⊆</sup><sup>+</sup> <sup>A</sup> <sup>x</sup>} where <sup>x</sup> <sup>⊆</sup><sup>p</sup> <sup>y</sup> means <sup>x</sup> <sup>⊆</sup> <sup>y</sup> and *pol*(y \ x) ⊆ {p}.

However, in general copycat is not an identity on all causal structs, only σ ⊆ cc<sup>A</sup> σ holds. Indeed, copycat represents an asynchronous buffer, and causal structs which expects messages to be transmitted synchronously may be affected by composition with copycat. We call causal structs that satisfy the equality **asynchronous**. From [6], we know that asynchronous causal structs form a compact-closed category.

*The syntactic tree.* The syntactic tree of a derivation can be read as a causal struct *Tr* (P) on , which will be the basis for our interpretation. It is defined by induction:

$$\begin{aligned} \operatorname{Tr}(\mu(\upsilon).P) &= \mu(\upsilon) \cdot \operatorname{Tr}(P) & \operatorname{Tr}(\mu[\upsilon].(P \mid Q)) &= \iota[\upsilon] \cdot (\operatorname{Tr}(P) \mid \operatorname{Tr}(Q)) \\ \operatorname{Tr}(a.\mathsf{case}\,(P,Q)) &= (a(\operatorname{in} \mathsf{L}) \cdot \operatorname{Tr}(P)) \cup (a(\operatorname{in} \mathsf{r}) \cdot \operatorname{Tr}(Q)) \\ \operatorname{Tr}(a[\mathsf{in} \mathsf{l}].P) &= a[\mathsf{in} \mathsf{l}] \cdot \operatorname{Tr}(P) & \operatorname{Tr}(a[\mathsf{in} \mathsf{r}].P) &= a[\mathsf{in} \mathsf{r}] \cdot \operatorname{Tr}(P) \\ \operatorname{Tr}([a \leftrightarrow b]) &= a\_{\|X\|} \text{ where } \mathcal{A} = a: X^{\perp}, b: X & \operatorname{Tr}(P \mid Q) = \operatorname{Tr}(P) \parallel \operatorname{Tr}(Q) \\ \operatorname{Tr}((\nu a)(P \mid Q)) &= \operatorname{Tr}(P) \odot \operatorname{Tr}(Q) \end{aligned}$$

We use the convention in the diagram, for instance u[v] means the initial send move of the u component. An example of this construction is given in Fig. 5a. Note that it is not asynchronous.

### **4 Acyclicity of Causal Structures**

The space of causal structs is unfortunately too broad to provide a notion of causal nets, due in particular to the presence of deadlocks during composition. As a first step towards defining causal nets, we introduce in this section a condition on causal structs inspired by the tensor rule in Linear Logic. In Sect. 4.1, we propose a notion of communication between actions, based on causality. In Sect. 4.2, we introduce a notion of acyclicity which is shown to be stable under composition and ensure deadlock-free composition.

#### **4.1 Communication in Causal Structures**

The tensor rule of Linear Logic says that after a tensor u[v], the proof splits into two independent subproofs, one handling u and the other v. This syntactic condition is there to ensure that there are no communications between u and v. More precisely, we want to prevent any dependence between subsequent actions on u and an action v. Indeed such a causal dependence could create a deadlock when facing a par rule u(v), which is allowed to put arbitrary dependence between such subsequent actions.

*Communication in MLL.* Let us start by the case of MLL, which corresponds to the case where games do not have conflicts. Consider the following three causal structs:

The causal structs <sup>σ</sup><sup>1</sup> and <sup>σ</sup><sup>2</sup> play on the game <sup>u</sup> : <sup>X</sup><sup>⊥</sup> <sup>⊗</sup> <sup>Y</sup> <sup>⊥</sup>, v : <sup>X</sup> ` <sup>Y</sup> , while <sup>σ</sup><sup>3</sup> plays on the game <sup>u</sup> : <sup>X</sup><sup>⊥</sup> <sup>⊗</sup> <sup>Y</sup> <sup>⊥</sup>, v : <sup>X</sup> <sup>⊗</sup> <sup>Y</sup> . The causal structs <sup>σ</sup><sup>2</sup> and <sup>σ</sup><sup>3</sup> are very close to proof nets, and it is easy to see that σ<sup>2</sup> represents a correct proof net while σ<sup>3</sup> does not. In particular, there exists a proof P such that *Tr* (P) ⊆ σ<sup>2</sup> but there are no such proof Q for σ3. Clearly, σ<sup>3</sup> should not be acyclic. But should σ2? After all it is sequentialisable. But, in all sequentialisations of σ2, the par rule v(z) is applied *before* the tensor u[w], and this dependency is not reflected by σ2. Since our goal is exactly to compute these implicit dependencies, we will only consider σ<sup>1</sup> to be acyclic, by using a stronger sequentialisation criterion:

**Definition 5.** *A causal struct* <sup>σ</sup> : Γ *is strongly sequentialisable when for all* x ∈ σ*, there exists with* x ∈ *Tr* (P) *and Tr* (P) ⊆ σ*.*

To understand the difference between σ<sup>1</sup> and σ2, we need to look at causal chains. In both σ<sup>1</sup> and σ2, we can go from u : X<sup>⊥</sup> to w : Y <sup>⊥</sup> by following immediate causal links in any direction, but observe that in σ<sup>1</sup> they must all cross an event below u[w] (namely v(z) or u[w]). This prompts us to define a notion of communication *outside a configuration* x:

**Definition 6.** *Given* σ : A *and* x ∈ σ *we say that* a, a ∈ A \ x *communicate* outside x *(written* a ↭x,σ a *) when there exists a chain* <sup>a</sup> x,σ <sup>a</sup><sup>0</sup> <sup>σ</sup> ··· x,σ <sup>a</sup><sup>n</sup> <sup>σ</sup> <sup>a</sup> *where all the* <sup>a</sup><sup>i</sup> <sup>∈</sup> <sup>A</sup> \ <sup>x</sup>*, and* x,σ *denotes the symmetric closure of* x,σ*.*

*Communication in MALL.* In presence of additives, immediate causality is not the only vector of communication. Consider the following causal struct σ4, playing on the context u : (A & A) ⊗ (A & A), v : (A ⊕ A)&(A ⊕ A) where A is irrelevant:

This pattern is not strongly sequentialisable: the tensor u[w] must always go after the &-introduction on v, since we need this information to know how whether v should go with u or w when splitting the context. Yet, it is not possible to find a communication path from one side to the other by following purely causal links without crossing u[w]. There is however a path that uses both immediate causality and *minimal conflict*. This means that we should identify events in minimal conflict, since they represent the same (&-introduction rule). Concretely, this means lifting the previous definition at the level of cells. Given an causal struct σ : A and x ∈ σ, along with two cells a, a of A/x, we define the relation <sup>a</sup> x,σ <sup>a</sup> when there exists <sup>a</sup> <sup>∈</sup> <sup>a</sup> and <sup>a</sup> <sup>∈</sup> <sup>a</sup> such that <sup>a</sup> x,σ <sup>a</sup> ; and a ↭x,σ a when there exists <sup>a</sup> x,σ <sup>a</sup><sup>0</sup> x,σ ··· <sup>σ</sup> <sup>a</sup><sup>n</sup> <sup>σ</sup> <sup>a</sup> where all the <sup>a</sup><sup>i</sup> do not intersect x. For instance, the two cells which are successors of the tensor u[w] in σ<sup>4</sup> communicate outside the configuration {u[w]} by going through the cell {v(inl), v(inr)}.

#### **4.2 Definition of Acyclicity on Casual Structures**

Since games are trees, two events a, a are either incomparable or have a meet a ∧ a . If a ∧ a is defined and positive, we say that a and a **have positive meet**, and means that they are on two distinct branches of a tensor. If a ∧ a is undefined, or defined and negative, we say that a ∧ a has a **negative meet**. When the meet is undefined, it means that a and a are events of different components of the context. We consider the meet to be negative in this case, since components of a context are related by an implicit par.

These definitions are easily extended to cells. The meet a ∧ a of two cells a and a of A is the meet a ∧ a for a ∈ a and a ∈ a : by confusion-freeness, it does not matter which ones are chosen. Similarly, we say that a and a have positive meet if a∧a is defined and positive; and have negative meet otherwise. These definitions formalise the idea of "the two sides of a tensor", and allow us to define acyclicity.

**Definition 7.** *A causal struct* σ : A *is acyclic when for all* x ∈ σ*, for any cells* a, a *not intersecting* x *and with positive meet, if* a ↭x,σ a *then* a ∧ a ∈ x*.*

This captures the desired intuition: if a and a are on two sides of a tensor a (ie. have positive meet), and there is a communication path outside x relating them, then a must also be outside x (and implicitly, the communication path must be going through a).

Reasoning on the interaction of acyclic strategies proved to be challenging. We prove that acyclic strategies compose, and their interaction are deadlockfree, when composition is on a rooted game B. This crucial assumption arises from the fact that in linear logic, cuts are on *formulas*. It entails that for any b, b ∈ B, b ∧ b is defined, hence must be positive either from the point of view of σ or of τ .

**Theorem 1.** *For acyclic causal structs* σ : A<sup>⊥</sup>  B *and* τ : B<sup>⊥</sup>  C*, (1) their interaction is* deadlock-free*:* τ σ = (σ  C)∩(A  τ )*; and (2) the causal struct* τ σ *is acyclic.*

As a result, acyclic and asynchronous causal structs form a category. We believe this intermediate category is interesting in its own right since it generalises the deadlock-freeness argument of Linear Logic without having to assume other constraints coming from Linear Logic, such as linearity. In the next section, we study further restriction on acyclic causal structs which guarantee strong sequentialisability.

# **5 Causal Nets and Sequentialisation**

We now ready to introduce causal nets. In Sect. 5.1, we give their definition by restricting acyclic causal structs and in Sect. 5.2 we prove that causal nets are strongly sequentialisable.

#### **5.1 Causal Nets: Totality and Well-Linking Casual Structs**

To ensure that our causal structs are strongly sequentialisable, acyclicity is not enough. First, we need to require causal structs to respect the linearity discipline of Linear Logic:

**Definition 8.** *A causal struct* σ : A *is total when* (1) *for* x ∈ σ*, if* x *is maximal in* σ*, then it is maximal in C* (A)*; and* (2) *for* x ∈ σ *and* a<sup>−</sup> ∈ A \ x *such that* x ∪ {a} ∈ σ*, then whenever , we also have* x ∪ {a } ∈ σ *as well.*

The first condition forces a causal struct to play until there are no moves to play, and the second forces an causal struct to be receptive to all Opponent choices, not a subset.

Our last condition constrains axiom links. A **linking** of a game A is a pair (x, ) of a <sup>x</sup> <sup>∈</sup> max *<sup>C</sup>* (A), and a bijection : (max<sup>A</sup> <sup>x</sup>)<sup>−</sup> (max<sup>A</sup> <sup>x</sup>)<sup>+</sup> preserving the *atom* labelling.

**Definition 9.** *A total causal struct* σ : A *is well-linking when for each* x ∈ max(σ)*, there exists a linking* <sup>x</sup> *of* x*, such that if* y *is a prime configuration of* <sup>x</sup>(e) *in* x*, then* max(y \ { <sup>x</sup>(e)}) = {e}*.*

This ensures that every positive atom has a unique predecessor which is a negative atom.

**Definition 10.** *A causal net is an acyclic, total and well-linking causal struct.* A causal net σ : A induces a set of linkings A, link(σ) := { <sup>x</sup> | x ∈ max σ}. The mapping link(·) maps causal nets to the proof nets of [22].

### **5.2 Strong Sequentialisation of Causal Nets**

Our proof of sequentialisation relies on an induction on causal nets. To this end, we provide an inductive deconstruction of parallel proofs. Consider σ : A a causal net and a minimal event a ∈ σ not an atom. We write A/a for A/{a}. Observe that if , it is easy to see that there exists a context such that . Given a causal struct σ : A, we define the causal struct σ/a = {x ∈ *C* (A/a) | x ∪ {a} ∈ σ} : A/a.

**Lemma 2.** σ/a *is a causal net on* A/a*.*

When a is positive, we can further decompose σ/a in disjoint parts thanks to acyclicity. Write a1,..., a<sup>n</sup> for the minimal cells of A/a and consider for n ≥ k > 0, A<sup>k</sup> = {a ∈ A/a | cell(a ) ↭{a},σ a<sup>k</sup>}. A<sup>k</sup> contains the events of A/a which σ connects to the k-th successor of a. We also define the set A<sup>0</sup> = A/a\ - <sup>1</sup>≤k≤<sup>n</sup> <sup>A</sup>k, of events not connected to any successor of a (this can happen with MIX). It inherits a game structure from A.

Each subset inherits a game structure from A/a. By acyclicity of σ, the A<sup>k</sup> are pairwise disjoint, so A/a ∼= A<sup>0</sup>  ...  An. For 0 ≤ k ≤ n, define σ<sup>k</sup> = *C* (Ak) ∩ σ/a.

**Lemma 3.** σ<sup>k</sup> *is a causal net on* A<sup>k</sup> *and we have* σ/a = σ<sup>0</sup>  ...  σn*.*

This formalises the intuition that after a tensor, an acyclic causal net must be a parallel composition of proofs (following the syntactic shape of the tensor rule of Linear Logic). From this result, we show by induction that any causal net is strongly sequentialisable.

**Theorem 2.** *If* σ : A *is a causal net, then* σ *is strongly sequentialisable.*

We believe sequentialisation without MIX requires causal nets to be *connected*: two cells with negative meets always communicate outside any configuration they are absent from. We leave this lead for future work.

# **6 Causal Invariants and Completeness**

Causal nets are naturally ordered by inclusion. When σ ⊆ τ , we can regard τ as a less sequential implementation of σ. Two causal nets which are upper bounded by a causal net should represent the same proof, but with varying degrees of sequentiality. Causal nets which are maximal for inclusion (among causal nets) are hence most parallel implementations of a certain behaviour and capture our intuition of causal invariants.

**Definition 11.** *A causal invariant is a causal net* σ : A *maximal for inclusion.*

#### **6.1 Causal Invariants as Maximal Causal Nets**

We start by characterising when two causal nets are upper-bounded for inclusion:

#### **Proposition 1.** *Given two causal nets* σ, τ : A*, the following are equivalent:*


*In this case we write* σ ↑ τ *and* σ ∨ τ *is the least upper bound of* σ *and* τ *for* ⊆*.*

It is a direct consequence of Proposition 1 that any causal net σ is included in a unique causal invariant σ<sup>↑</sup> : A, defined as: σ<sup>↑</sup> = <sup>σ</sup>⊆<sup>τ</sup> <sup>τ</sup> , where <sup>τ</sup> ranges over causal nets.

**Lemma 4.** *For* σ, τ : A *causal nets,* σ ↑ τ *iff* σ<sup>↑</sup> = τ <sup>↑</sup>*. Moreover, if* σ *and* τ *are causal invariants,* σ ↑ τ *if and only if* σ = τ *.*

**Fig. 5.** Interpreting P = u(u ). v(v ). w[w ]. ([u ↔ w] | ([w ↔ v ] | [u ↔ v])) in the context u : X ` Z⊥, v : Z ` Y,w : X<sup>⊥</sup> ⊗ Y <sup>⊥</sup>

The interpretation of a proof is simply defined as P <sup>=</sup> *Tr* (P)↑. Figure 5c illustrates the construction on a proof of MLL+mix. The interpretation features a disjunctive causality, as the tensor can be introduced as soon as *one* of the two pars has been.

Defining link(P) = link(*Tr* (P)), we have from Lemma 4: link(P) = link(Q) if and only if P <sup>=</sup> Q. This implies that our model has the same equational theory than the proof nets of [22]. Such proof nets are already complete:

**Theorem 3 (**[22]**).** *For* P, Q *two proofs of* Γ*, we have* P ≡ Q *iff link*(P) = *link*(Q)*.*

As a corollary, we get:

**Theorem 4.** *For cut-free proofs* P, Q *we have* <sup>P</sup> <sup>≡</sup> <sup>Q</sup> *iff* P <sup>=</sup> Q*.*

The technical report [7] also provides an inductive proof not using the result of [22]. A consequence of this result, along with *strong* sequentialisation is: P <sup>=</sup> - <sup>Q</sup>≡<sup>P</sup> *Tr* (Q). This equality justifies our terminology of "causal completeness", as for instance it implies that the minimal events of P correspond exactly the possible rules in P that can be pushed to the front using the commuting conversions.

#### **6.2 The Category of Causal Invariants**

So far we have focused on the static. Can we integrate the dynamic aspect of proofs as well? In this section, we show that causal invariants organise themselves in a category. First, we show that causal nets are stable under composition:

**Lemma 5.** *If* σ : A<sup>⊥</sup>  B *and* τ : B<sup>⊥</sup>  C *are causal nets, then so is* τ σ*.*

Note that totality requires acyclicity (and deadlock-freedom) to be stable under composition. However, causal invariants are not stable under composition: τ σ might not be maximal, even if τ and σ are. Indeed, during the interaction, some branches of τ will not be explored by σ and vice-versa which can lead to new allowed reorderings. However, we can always embed τ σ into (τ σ)↑:

**Lemma 6.** *Rooted games and causal invariants form a category CInv, where the composition of* σ : A<sup>⊥</sup>  B *and* τ : B<sup>⊥</sup>  C *is* (τ σ)<sup>↑</sup> *and the identity on* A *is* cc ↑ A*.*

Note that the empty game is an object of **CInv**, as we need a monoidal unit.

*Monoidal-closed structure.* Given two games <sup>A</sup> and <sup>B</sup> we define <sup>A</sup>⊗<sup>B</sup> as send<sup>+</sup> · (A  B), and 1 as the empty game. There is an obvious isomorphism A ⊗ 1 ∼= A and A⊗(B ⊗C) ∼= (A⊗B)⊗C in **CInv**. We now show how to compute directly the functorial action of ⊗, without resorting to <sup>↑</sup>. Consider σ ∈ **CInv**(A, B) and τ ∈ **CInv**(C, D). Given x ∈ *C* ((A ⊗ C)<sup>⊥</sup>  (B ⊗ D)), we define xσ = x ∩ (A<sup>⊥</sup>  B) and xτ = x ∩ (C<sup>⊥</sup>  D). If xσ ∈ σ and xτ ∈ τ , we say that x is connected when there exists cells a, b,c and d of A, B, C and D respectively such that a ↭<sup>x</sup> <sup>σ</sup>,σ c and b ↭<sup>x</sup> <sup>τ</sup>,τ d. We define:

$$\sigma \otimes \tau = \left\{ \begin{array}{l} x \in \mathfrak{G} \left( (A \otimes C)^{\perp} \parallel (B \otimes D) \right) \text{ such that } \colon \\ \quad \text{(1)} \ x \langle \sigma \rangle \in \sigma \text{ and } x \langle \tau \rangle \in \tau \\ \quad \text{(2)} \text{ if } x \text{ is connected and contains } \mathbf{send}^{+}, \text{ then } \mathbf{send}^{-} \in x \end{array} \right\}$$

In (2), send<sup>−</sup> refers to the minimal move of (<sup>A</sup> <sup>⊗</sup> <sup>C</sup>)<sup>⊥</sup> and send<sup>+</sup> to the one of B ⊗ D. (2) ensures that σ ⊗ τ is acyclic.

**Lemma 7.** *The tensor product defines a symmetric monoidal structure on CInv.*

Define A ` B = (A<sup>⊥</sup> ⊗ B⊥)⊥, ⊥ =1= *∅* and A B = A<sup>⊥</sup> ` B.

**Lemma 8.** *We have a bijection* `B,C *between causal invariants on* A  B  C *and on* A  (B ` C)*. As a result, there is an adjunction* A ⊗ A *.*

Lemma 8 implies that **CInv**((A ⊥) ⊥) **CInv**(A), and **CInv** is ∗-autonomous.

*Cartesian products.* Given two games A, B in **CInv**, we define their product A & B = inl<sup>−</sup> · A + inr<sup>−</sup> · B. We show how to construct the pairing of two causal invariants concretely. Given σ ∈ **CInv**(A, B) and τ ∈ **CInv**(A, C), we define the common behaviour of σ and τ on A to be those x ∈ *C* (A⊥) ∩ σ ∩ τ such that for all a, a outside of x with positive meet, a ↭x,σ a iff a ↭x,τ a . We write σ ∩<sup>A</sup> τ for the set of common behaviours of σ and τ and define: σ, τ = (L<sup>−</sup> · σ) ∪ (R<sup>−</sup> · τ ) ∪ σ ∩<sup>A</sup> τ . The projections are defined using copycat: π<sup>1</sup> = {x ∈ *C* ((A & B)<sup>⊥</sup>  A) | x ∩ (A<sup>⊥</sup>  A) ∈ cc ↑ <sup>A</sup>} (and similarly for π2).

**Theorem 5.** *CInv has products. As it is also* ∗*-autonomous, it is a model of MALL.*

It is easy to see that the interpretation of MALL<sup>−</sup> in **CInv** following the structure is the same as ·, however it is computed compositionally without resorting to the <sup>↑</sup> operator. We deduce that our interpretation is invariant by cut-elimination: if <sup>P</sup> <sup>→</sup> <sup>Q</sup>, then P <sup>=</sup> Q. Putting the pieces together, we get the final result.

**Theorem 6.** *CInv is an injective and fully complete model of MALL*−*.*

## **7 Extensions and Related Work**

The model provides a representation of proofs which retains only the necessary sequentiality. We study the phenomenon in Linear Logic, but commuting conversions of additives arise in other languages, eg. in functional languages with sums and products, where proof nets do not necessarily exist. Having an abstract representation of which reorderings are allowed could prove useful (reasoning on the possible commuting conversions in a language with sum types is notoriously difficult).

*Extensions.* Exponentials are difficult to add, as their conversions are not as canonical as those of MALL. Cyclic proofs [2] could be accomodated via recursive event structures.

Adding multiplicative units while keep determinism is difficult, as their commuting conversion is subtle (*e.g.* conversion for MLL is PSPACE-complete [18]), and exhibit apparent nondeterminism. For instance the following proofs are convertible in MLL:

$$a(\text{)}.b[\text{]} \mid c[] \equiv a(\text{)}.(b[] \mid c[]) \equiv b[] \mid a(\text{)}.c[] \rhd a: \bot, b: 1, c: 1$$

where a(). P is the process counterpart to introduction of ⊥ and a[] of 1. Intuitively, b[] and c[] can be performed at the start, but as soon as one is performed, the other has to wait for the input on a. This cannot be modelled inside deterministic general event structures, as it is only deterministic against an environment that will emit on b. In contrast, proofs of MALL<sup>−</sup> remain deterministic even if their environment is not total.

We would also be interested in recast multifocusing [9] in our setting by defining a class of focussed causal nets, where there are no concurrency between positive and negative events, and show that sequentialisation always give a focused proof.

*Related work.* The first fully complete model of MALL<sup>−</sup> is based on closure operators [1], later extended to full Linear Logic [24]. True concurrency is used to define innocence, on which the full completeness result rests. However their model does not take advantage of concurrency to account for permutations, as strategies are sequential. This investigation has been extended to concurrent strategies by Mimram and Melli`es [25,26]. De Carvalho showed that the relational model is injective for MELL [11]. In another direction, [4] provides a fully complete model for MALL without game semantics, by using a glueing construction on the model of hypercoherences. [21] explores proof nets a weaker theory of commuting conversions for MALL.

The idea of having intermediate representations between proof nets and proofs has been studied by Faggian and coauthors using l-nets [10,13–16], leading to a similar analysis to ours: they define a space of causal nets as partial orders and compare different versions of proofs with varying degree of parallelism. Our work recasts this idea using event structures and adds the notion of causal completeness: keeping jumps that cannot be undone by a permutation, which leads naturally to step outside partial orders, as well as full completeness: which causal nets can be strongly sequentialised?

The notion of dependency between logical rules has also been studied in [3] in the case of MLL. From a proof net R, they build a partial order D`,⊗(R) which we believe is very related to P where <sup>P</sup> is a sequentialisation of <sup>R</sup>. Indeed, in the case of MLL *without MIX* a partial order is enough to capture the dependency between rules. The work [12] shows that permutation rules of Linear Logic, understood as asynchronous optimisations on processes, are included in the observational equivalence. [19] studies mutual embedding between polarised proof nets [23] and the control π-calculus [20]. In another direction, we have recently built a fully-abstract, concurrent game semantics model of the synchronous session π-calculus [8]. The difficulty there was to understand name passing and the synchrony of the π-calculus, which is the dual of our objective here: trying to understand the asynchrony behind the conversions of MALL−.

**Acknowledgements.** We would like to thank Willem Heijltjes, Domenico Ruoppolo, and Olivier Laurent for helpful discussions, and the anonymous referees for their insightful comments. This work has been partially sponsored by: EPSRC EP/K034413/1, EP/K011715/1, EP/L00058X/1, EP/N027833/1, and EP/N028201/1.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Rewriting Abstract Structures: Materialization Explained Categorically**

Andrea Corradini<sup>1</sup>, Tobias Heindel<sup>2</sup>, Barbara K¨onig<sup>3</sup>, Dennis Nolte3(B) , and Arend Rensink<sup>4</sup>

> <sup>1</sup> Universit`a di Pisa, Pisa, Italy andrea@di.unipi.it <sup>2</sup> University of Hawaii, Honolulu, USA heindel@hawaii.edu <sup>3</sup> Universit¨at Duisburg-Essen, Duisburg, Germany {barbara koenig,dennis.nolte}@uni-due.de <sup>4</sup> University of Twente, Enschede, Netherlands arend.rensink@utwente.nl

**Abstract.** The paper develops an abstract (over-approximating) semantics for double-pushout rewriting of graphs and graph-like objects. The focus is on the so-called materialization of left-hand sides from abstract graphs, a central concept in previous work. The first contribution is an accessible, general explanation of how materializations arise from universal properties and categorical constructions, in particular partial map classifiers, in a topos. Second, we introduce an extension by enriching objects with annotations and give a precise characterization of strongest post-conditions, which are effectively computable under certain assumptions.

# **1 Introduction**

Abstract interpretation [12] is a fundamental static analysis technique that applies not only to conventional programs but also to general infinite-state systems. Shape analysis [30], a specific instance of abstract interpretation, pioneered an approach for analyzing pointer structures that keeps track of information about the "heap topology", e.g., out-degrees or existence of certain paths. One central idea of shape analysis is *materialization*, which arises as companion operation to summarizing distinct objects that share relevant properties. Materialization, a.k.a. partial concretization, is also fundamental in verification approaches based on separation logic [5,6,24], where it is also known as rearrangement [26], a special case of frame inference. Shape analysis—construed in a wide sense—has been adapted to graph transformation [29], a general purpose modelling language for systems with dynamically evolving topology, such as network protocols and cyber-physical systems. Motivated by earlier work of shape analysis for graph

T. Heindel—Partially supported by AFOSR.

c The Author(s) 2019

M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 169–188, 2019. https://doi.org/10.1007/978-3-030-17127-8\_10

transformation [1,2,4,27,28,31], we want to put the materialization operation on a new footing, widening the scope of shape analysis.

A natural abstraction mechanism for transition systems with graphs as states "summarizes" all graphs over a specific *shape graph*. Thus a single graph is used as abstraction for all graphs that can be mapped homomorphically into it. Further annotations on shape graphs, such as cardinalities of preimages of its nodes and general first-order formulas, enable fine-tuning of the granularity of abstractions. While these natural abstraction principles have been successfully applied in previous work [1,2,4,27,28,31], their companion materialization constructions are notoriously difficult to develop, hard to understand, and are redrawn from scratch for every single setting. Thus, we set out to explain materializations based on mathematical principles, namely universal properties (in the sense of category theory). In particular, partial map classifiers in the topos of graphs (and its slice categories) cover the purely structural aspects of materializations; this is related to final pullback complements [13], a fundamental construction of graph rewriting [7,25]. Annotations of shape graphs are treated orthogonally via op-fibrations.

The first milestones of a general framework for shape analysis of graph transformation and more generally rewriting of objects in a topos are the following: - A rewriting formalism for graph abstractions that lifts the rule-based rewriting from single graphs to *abstract graphs*; it is developed for (abstract) objects in a topos.




*Related work:* The idea of shape graphs together with shape constraints was pioneered in [30] where the constraints are specified in a three-valued logic. A similar approach was proposed in [31], using first-order formulas as constraints. In partner abstraction [3,4], cluster abstraction [1,2], and neighbourhood abstraction [28] nodes are clustered according to local criteria, such as their neighbourhood and the resulting graph structures are enriched with counting constraints, similar to our constraints. The idea of counting multiplicities of nodes and edges is also found in canonical graph shapes [27]. The uniform treatment of monoid annotations was introduced in previous work [9,10,20], in the context of type systems and with the aim of studying decidability and closure properties, but not for abstract rewriting.

### **2 Preliminaries**

This paper presupposes familiarity with category theory and the topos structure of graphs. Some concepts (in particular elementary topoi, subobject and partial map classifiers, and slice categories) are defined in the full version of this paper [8], which also contains all the proofs.

The rewriting formalism for graphs and graph-like structures that we use throughout the paper is the double-pushout (DPO) approach [11]. Although it was originally introduced for graphs [16], it is well-defined in any category **C**. However, certain standard results for graph rewriting require that the category **C** has "good" properties. The category of graphs is an elementary topos—an extremely rich categorical structure—but weaker conditions on **C**, for instance adhesivity, have been studied [14,15,21].

**Definition 1 (Double-pushout rewriting).** *A* production *in* **C** *is a span of monos* L I R *in* **C***; the objects* L *and* R *are called left- and right-hand side, respectively. A* match *of a production* p: L I R

*to an object* X *of* **C** *is a mono* m<sup>L</sup> : L X *in* **C***. The production* p *rewrites* X *to* Y *at* m<sup>L</sup> *(resp. the match* <sup>m</sup><sup>L</sup> *to the* co-match <sup>m</sup><sup>R</sup> : <sup>R</sup> <sup>→</sup> <sup>Y</sup> *) if the production and the match (and the co-match) extend to a diagram in* **C***, shown to the right, such that both squares are pushouts.* LIR XCY m<sup>L</sup> (PO) (PO) m<sup>R</sup>

*In this case, we write* X p,m<sup>L</sup> <sup>=</sup><sup>⇒</sup> <sup>Y</sup> *(resp.* (<sup>L</sup> <sup>m</sup><sup>L</sup> <sup>X</sup>) <sup>p</sup> <sup>⇒</sup> (<sup>R</sup> <sup>m</sup> <sup>→</sup><sup>R</sup> <sup>Y</sup> )*). We also write* X p,m<sup>L</sup> <sup>=</sup><sup>⇒</sup> *if there exists an object* <sup>Y</sup> *such that* <sup>X</sup> p,m<sup>L</sup> <sup>=</sup><sup>⇒</sup> <sup>Y</sup> *and* <sup>X</sup> <sup>p</sup> <sup>⇒</sup> <sup>Y</sup> *if the specific match* m<sup>L</sup> *is not relevant.*

Given a production <sup>p</sup> and a match <sup>m</sup>L, if there exist arrows <sup>X</sup> <sup>←</sup> <sup>C</sup> and <sup>C</sup> <sup>←</sup> <sup>I</sup> that make the left-hand square of the diagram in Definition <sup>1</sup> a pushout square, then the *gluing condition* is satisfied.

If **C** is an adhesive category (and thus also if it is a topos [22]) and the production consists of monos, then all remaining arrows of double-pushout diagrams of rewriting are monos [21] and the result of rewriting—be it the object Y or the co-match mR—is unique (up to a canonical isomorphism).

#### **2.1 Subobject Classifiers and Partial Map Classifiers of Graphs**

A standard category for graph rewriting that is also a topos is the category of edge-labelled, directed graphs that we shall use in examples, as recalled in the next definition. Note that due to the generality of the categorical framework, our results also hold for various other forms of graphs, such as node-labelled graphs, hypergraphs, graphs with scopes or graphs with second-order edges.

**Definition 2 (Category of graphs).** *Let* Λ *be a fixed set of* edge labels*. A* (Λ-labelled) graph *is a tuple* G = (VG, EG, *src*G,*tgt*G, G) *where* V<sup>G</sup> *is a finite set of* nodes*,* <sup>E</sup><sup>G</sup> *is a finite set of* edges*, src*G,*tgt*<sup>G</sup> : <sup>E</sup><sup>G</sup> <sup>→</sup> <sup>V</sup><sup>G</sup> *are the* source *and* target mappings *and* <sup>G</sup> : <sup>E</sup><sup>G</sup> <sup>→</sup> <sup>Λ</sup> *is the* labelling function*.*

*Let* G, H *be two* <sup>Λ</sup>*-labelled graphs. A* graph morphism <sup>ϕ</sup>: <sup>G</sup> <sup>→</sup> <sup>H</sup> *consists of two functions* <sup>ϕ</sup><sup>V</sup> : <sup>V</sup><sup>G</sup> <sup>→</sup> <sup>V</sup>H*,* <sup>ϕ</sup><sup>E</sup> : <sup>E</sup><sup>G</sup> <sup>→</sup> <sup>E</sup>H*, such that for each edge* <sup>e</sup> <sup>∈</sup> <sup>E</sup><sup>G</sup> *we have src*H(ϕE(e)) = <sup>ϕ</sup><sup>V</sup> (*src*G(e))*, tgt*H(ϕE(e)) = <sup>ϕ</sup><sup>V</sup> (*tgt*G(e)) *and* H(ϕE(e)) = G(e)*. If* ϕ<sup>V</sup> , ϕ<sup>E</sup> *are both bijective,* ϕ *is an isomorphism. The category having (*Λ*-labelled) graphs as objects and graph morphisms as arrows is denoted by* **Graph***.*

We shall often write ϕ instead of ϕ<sup>V</sup> or ϕ<sup>E</sup> to avoid clutter. The graph morphisms in our diagrams will be indicated by black and white nodes and thick edges. In the category **Graph**, where the objects are labelled graphs over the label alphabet Λ, the subobject classifier true is displayed to the right where every Λ-labelled edge represents several edges, one for each <sup>λ</sup> <sup>∈</sup> <sup>Λ</sup>.

The subobject classifier true: **1** Ω from the terminal object **1** to Ω allows us to single out a subgraph X of a graph Y , by mapping Y to Ω in such a way that all elements of X are mapped to the image of true.

Given arrows α, m as in the diagram in Definition 3, we can construct the most general pullback, called final pullback complement [7,13].

**Definition 3 (Final pullback complement).** *A pair of arrows* I <sup>γ</sup> <sup>→</sup> <sup>F</sup> <sup>β</sup> <sup>→</sup> <sup>G</sup> *is a* final pullback complement (FPBC) *of another pair* I <sup>α</sup> <sup>→</sup> <sup>L</sup> <sup>m</sup> <sup>→</sup> <sup>G</sup> *if*


LI I GF F α (FPBC) γ α- f γ- β β- f-

Final pullback complements and subobject classifiers are closely related to partial map classifiers (see [13, Corollary 4.6]): a category has FPBCs (over monos) and a subobject classifier if and only if it has a partial map classifier. These exist in all elementary topoi.

**Proposition 4 (Final pullback complements, subobject and partial map classifiers).** *Let* **C** *be a category with finite limits. Then the following are equivalent:*


#### **2.2 Languages**

The main theme of the paper is "simultaneous" rewriting of entire sets of objects of a category by means of rewriting a single *abstract* object that represents a collection of structures—the *language* of the abstract object. The simplest example of an abstract structure is a plain object of a category to which we associate the language of objects that can be mapped to it; the formal definition is as follows (see also [10]).

**Definition 5 (Language of an object).** *Let* A *be an object of a category* **C***. Given another object* X*, we write* X A *whenever there exists an arrow from* <sup>X</sup> *to* <sup>A</sup>*. We define the* language<sup>1</sup> *of* <sup>A</sup>*, denoted by* <sup>L</sup>(A)*, as* <sup>L</sup>(A) = {<sup>X</sup> <sup>∈</sup> **<sup>C</sup>** <sup>|</sup> <sup>X</sup> <sup>A</sup>}*.*

Whenever <sup>X</sup> ∈ L(A) holds, we will say that <sup>X</sup> is *abstracted by* <sup>A</sup>, and <sup>A</sup> is called the *abstract object*. In the following we will also need to characterize a class of (co-)matches which are represented by a given (co-)match (which is a mono).

**Definition 6 (Language of a mono).** *Let* ϕ: L A *be a mono in* **C***. The* language *of* ϕ *is the set of monos* m *with source* L *that factor* ϕ *such that the square on the right is a pullback:*

$$\mathcal{L}(\varphi) = \{ m \colon L \longmapsto X \mid \exists (\psi \colon X \to A) \begin{array}{c} L \longmapsto \xrightarrow{m} X \\ \stackrel{id\_L}{\downarrow} \xleftarrow{(\mathbb{P}B)} \downarrow \psi \\ L \longmapsto \xrightarrow{} A \end{array} \tag{1}$$

Intuitively, for any arrow (L <sup>m</sup> <sup>→</sup> <sup>X</sup>) ∈ L(ϕ) we have <sup>X</sup> ∈ L(A) and <sup>X</sup> has a distinguished subobject L which corresponds precisely to the subobject L A. In fact ψ restricts and co-restricts to an isomorphism between the images of L in X and A. For graphs, no nodes or edges in X outside of L are mapped by ψ into the image of L in A.

# **3 Materialization**

Given a production p : L I R, an abstract object A, and a (possibly non-monic) arrow <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup>, we want to transform the abstract object <sup>A</sup> in order to characterize all successors of objects in <sup>L</sup>(A), i.e., those obtained by rewriting via p at a match compatible with ϕ. (Note that ϕ is not required to be monic, because a monic image of the left-hand side of <sup>p</sup> in an object of <sup>L</sup>(A) could be mapped non-injectively to A.) Roughly, we want to lift DPO rewriting to the level of abstract objects.

For this, it is necessary to use the materialization construction, defined categorically in Sect. 3.1, that enables us to concretize an instance of a left-hand side in a given abstract object. This construction is refined in Sect. 3.2 where we restrict to materializations that satisfy the gluing condition and can thus be rewritten via p. Finally in Sect. 3.3 we present the main result about materializations showing that we can fully characterize the co-matches obtained by rewriting.

<sup>1</sup> Here we assume that **C** is essentially small, so that a language can be seen as a set instead of a proper class of objects.

#### **3.1 Materialization Category and Existence of Materialization**

From now on we assume **C** to be an elementary topos. We will now define the materialization, which, given an arrow <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup>, characterizes all objects <sup>X</sup>, abstracted over A, which contain a (monic) occurrence of the left-hand side compatible with ϕ.

**Definition 7 (Materialization).** *Let* <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup> *be an arrow in* **<sup>C</sup>***. The* materialization category *for* <sup>ϕ</sup>*, denoted* **Mat**ϕ*, has as*

**objects** *all factorizations* <sup>L</sup> <sup>X</sup> <sup>→</sup> <sup>A</sup> *of* <sup>ϕ</sup> *whose first factor* L X *is a mono, and as* **arrows** *from a factorization* <sup>L</sup> <sup>X</sup> <sup>→</sup> <sup>A</sup> *to another one* <sup>L</sup> <sup>Y</sup> <sup>→</sup> <sup>A</sup>*, all arrows* <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> *in* **<sup>C</sup>** *such that the diagram to the right is made of a commutative triangle and a pullback square.*

*If* **Mat**<sup>ϕ</sup> *has a terminal object it is denoted by* <sup>L</sup> ϕ<sup>→</sup> <sup>A</sup> *and is called the* materialization *of* ϕ*.*

Sometimes we will also call the object <sup>ϕ</sup> the materialization of <sup>ϕ</sup>, omitting the arrows.

Since we are working in a topos by assumption, the slice category over A provides us with a convenient setting to construct materializations. Note in particular that in the diagram in Definition 7 above, the span X L L is a partial map from X to L in the slice category over A. Hence the materialization <sup>ϕ</sup> corresponds to the partial map classifier for <sup>L</sup> in this slice category.

**Proposition 8 (Existence of materialization).** *Let* <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup> *be an arrow in* **<sup>C</sup>***, and let* <sup>η</sup><sup>ϕ</sup> : <sup>ϕ</sup> <sup>→</sup> <sup>F</sup>(ϕ)*, with* <sup>F</sup>(ϕ): <sup>A</sup>¯ <sup>→</sup> <sup>A</sup>*, be the partial map classifier of* <sup>ϕ</sup> *in the slice category* **<sup>C</sup>**↓<sup>A</sup> *(which also is a topos).*<sup>2</sup> *Then* <sup>L</sup> <sup>η</sup><sup>ϕ</sup> <sup>→</sup> <sup>A</sup>¯ <sup>F</sup> (ϕ) <sup>→</sup> <sup>A</sup> *is the materialization of* <sup>ϕ</sup>*, hence* <sup>ϕ</sup> <sup>=</sup> <sup>A</sup>¯*.*

As a direct consequence of Propositions 4 and 8 (and the fact that final pullback complements in the slice category correspond to those in the base category [25]), the terminal object of the materialization category can be constructed for each arrow of a topos by taking final pullback complements.

**Corollary 9 (Construction of the materialization).** *Let* <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup> *be an arrow of* **<sup>C</sup>** *and let* true<sup>A</sup> : <sup>A</sup> <sup>A</sup> <sup>×</sup> <sup>Ω</sup> *be the subobject classifier (in the slice category* **<sup>C</sup>** <sup>↓</sup> <sup>A</sup>*) from id*<sup>A</sup> : <sup>A</sup> <sup>→</sup> <sup>A</sup> *to the projection* <sup>π</sup><sup>1</sup> : <sup>A</sup> <sup>×</sup> <sup>Ω</sup> <sup>→</sup> <sup>A</sup>*.* <sup>η</sup><sup>ϕ</sup> <sup>ϕ</sup> 

*Then the terminal object* <sup>L</sup> <sup>η</sup><sup>ϕ</sup> <sup>ϕ</sup> <sup>ψ</sup> <sup>→</sup> <sup>A</sup> *in the materialization category consists of the arrows* <sup>η</sup><sup>ϕ</sup> *and* <sup>ψ</sup> <sup>=</sup> <sup>π</sup><sup>1</sup> ◦ <sup>χ</sup><sup>η</sup><sup>ϕ</sup> *, where* <sup>L</sup> <sup>η</sup><sup>ϕ</sup> <sup>ϕ</sup> χηϕ <sup>→</sup> <sup>A</sup> <sup>×</sup> <sup>Ω</sup> *is the final pullback complement of* L <sup>ϕ</sup> <sup>→</sup> <sup>A</sup> true<sup>A</sup> <sup>A</sup> <sup>×</sup> <sup>Ω</sup>*.* L ϕ --

χηϕ

ψ

<sup>-</sup>- <sup>A</sup> true<sup>A</sup> <sup>A</sup> <sup>×</sup> <sup>Ω</sup> <sup>π</sup><sup>1</sup> (FPBC) A

<sup>2</sup> This is by the Fundamental Theorem of topos theory [17, Theorem 2.31].

*Example 10.* We construct the materialization <sup>L</sup> <sup>η</sup><sup>ϕ</sup> <sup>ϕ</sup> <sup>ψ</sup> <sup>→</sup> <sup>A</sup> for the following morphism <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup> of graphs with a single (omitted) label:

$$
\varphi \underbrace{\cdots \longrightarrow \cdots}\_{\ell}
$$

In particular, the materialization is obtained as a final pullback complement as depicted to the right (compare with the corresponding diagram in Corollary 9). Note that edges which are not in the image of η<sup>ϕ</sup> resp. true<sup>A</sup> are dashed.

This construction corresponds to the usual intuition behind materialization: the left-hand side and the edges that are attached to it are "pulled out" of the given abstract graph.

We can summarize the result of our constructions in the following proposition:

**Proposition 11 (Language of the materialization).** *Let* <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup> *be an arrow in* **<sup>C</sup>** *and let* <sup>L</sup> <sup>η</sup><sup>ϕ</sup> ϕ<sup>→</sup> <sup>A</sup> *be the corresponding materialization. Then we have*

> <sup>L</sup>(<sup>L</sup> <sup>η</sup><sup>ϕ</sup> <sup>ϕ</sup> ) = {<sup>L</sup> <sup>m</sup><sup>L</sup> <sup>X</sup> | ∃ψ: (<sup>X</sup> <sup>→</sup> <sup>A</sup>). (<sup>ϕ</sup> <sup>=</sup> <sup>ψ</sup> ◦ <sup>m</sup>L)}.

#### **3.2 Characterizing the Language of Rewritable Objects**

A match obtained through the materialization of the left-hand side of a production from a given object may not allow a DPO rewriting step because of the gluing condition. We illustrate this problem with an example.

> L

*Example 12.* Consider the materialization <sup>L</sup> ϕ<sup>→</sup> <sup>A</sup> from Example 10 and the production L I R shown in the diagram to the right. It is easy to see that the pushout complement of morphisms <sup>I</sup> <sup>L</sup> <sup>ϕ</sup> does not exist. ϕ

Nevertheless there exist factorizations <sup>L</sup> <sup>X</sup> <sup>→</sup> <sup>A</sup> abstracted by <sup>ϕ</sup> that could be rewritten using the production.

In order to take the existence of pushout complements into account, we consider a subcategory of the materialization category.

**Definition 13 (Materialization subcategory of rewritable objects).** *Let* <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup> *be an arrow of* **<sup>C</sup>** *and let* <sup>ϕ</sup><sup>L</sup> : <sup>I</sup> <sup>L</sup> *be a mono (corresponding to the left leg of a production). The* materialization subcategory of rewritable objects *for* ϕ *and* ϕL*, denoted* **Mat**ϕ<sup>L</sup> <sup>ϕ</sup> *, is the full subcategory of* **Mat**<sup>ϕ</sup> *containing as objects all factorizations* L <sup>m</sup> <sup>X</sup> <sup>→</sup> <sup>A</sup> *of* <sup>ϕ</sup>*, where* <sup>m</sup> *is a mono and* <sup>I</sup> <sup>ϕ</sup><sup>L</sup> <sup>L</sup> <sup>m</sup> X *has a pushout complement.*

*Its terminal element, if it exists, is denoted by* L <sup>n</sup><sup>L</sup> ϕ, ϕL<sup>→</sup> <sup>A</sup> *and is called the* rewritable materialization*.*

We show that this subcategory of the materialization category has a terminal object.

**Proposition 14 (Construction of the rewritable materialization).** *Let* <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup> *be an arrow and let* <sup>ϕ</sup><sup>L</sup> : <sup>I</sup> <sup>L</sup> *be a mono of* **<sup>C</sup>***. Then the* rewritable materialization of ϕ w.r.t. ϕ<sup>L</sup> *exists and can be constructed as the following factorization* L <sup>n</sup><sup>L</sup> ϕ, ϕ<sup>L</sup> <sup>ψ</sup>◦<sup>α</sup> −→ <sup>A</sup> *of* <sup>ϕ</sup>*. In the left diagram,* <sup>F</sup> *is obtained as the final pullback complement of* I <sup>ϕ</sup><sup>L</sup> <sup>L</sup> <sup>ϕ</sup> *, where* <sup>L</sup> <sup>ϕ</sup> <sup>ψ</sup> <sup>→</sup> <sup>A</sup> *is the materialization of* ϕ *(Definition 7). Next in the right diagram* L <sup>n</sup><sup>L</sup> ϕ, ϕ<sup>L</sup> <sup>β</sup> F *is the pushout of the span* L <sup>ϕ</sup><sup>L</sup> I F *and* α *is the resulting mediating arrow.*

*Example 15.* We come back to the running example (Example 12) and, as in Proposition 14, determine the final pullback complement <sup>I</sup> <sup>F</sup> <sup>ϕ</sup> of <sup>I</sup> <sup>ϕ</sup><sup>L</sup> <sup>L</sup> <sup>ϕ</sup> (see diagram below left) and obtain ϕ, ϕ<sup>L</sup> by taking the pushout over L I F (see diagram below right).

It remains to be shown that <sup>L</sup> ϕ, ϕL<sup>→</sup> <sup>A</sup> represents every factorization which can be rewritten. As before we obtain a characterization of the rewritable objects, including the match, as the language of an arrow.

**Proposition 16 (Language of the rewritable materialization).** *Assume there is a production* p: L <sup>ϕ</sup><sup>L</sup> <sup>I</sup> <sup>ϕ</sup><sup>R</sup> <sup>R</sup> *and let* <sup>L</sup> <sup>n</sup><sup>L</sup> ϕ, ϕ<sup>L</sup> *be the match for the rewritable materialization for* ϕ *and* ϕL*. Then we have*

$$\mathcal{L}(L \stackrel{n\_L}{\longmapsto} \langle \langle \varphi, \varphi\_L \rangle \rangle) = \{ L \stackrel{m\_L}{\longmapsto} X \mid \exists \psi \colon (X \to A) . \ (\varphi = \psi \circ m\_L \land X \stackrel{p, m\_L}{\Longrightarrow}) \}.$$

#### **3.3 Rewriting Materializations**

In the next step we will now rewrite the rewritable materialization ϕ, ϕ<sup>L</sup> with the match L <sup>n</sup><sup>L</sup> ϕ, ϕ<sup>L</sup> , resulting in a co-match <sup>R</sup> <sup>B</sup>. In particular, we will show that this co-match represents all co-matches that can be obtained by rewriting an object <sup>X</sup> of <sup>L</sup>(A) at a match compatible with <sup>ϕ</sup>. We first start with an example.

*Example 17.* We can rewrite the materialization <sup>L</sup> ϕ, ϕL<sup>→</sup> <sup>A</sup> as follows:

**Proposition 18 (Rewriting abstract matches).** *Let a match* n<sup>L</sup> : L A˜ *and a production* p: L I R *be given. Assume that* A˜ *is rewritten along the match* <sup>n</sup>L*, i.e.,* (<sup>L</sup> <sup>n</sup><sup>L</sup> <sup>A</sup>˜) <sup>p</sup> <sup>⇒</sup> (<sup>R</sup> <sup>n</sup><sup>R</sup> B)*. Then*

$$\mathcal{L}(R \stackrel{n\_R}{\rightleftarrows} B) = \{ R \stackrel{m\_R}{\rightleftarrows} Y \mid \exists (L \stackrel{m\_L}{\rightleftarrows} X) \in \mathcal{L}(L \stackrel{n\_L}{\rightleftarrows} \tilde{A}). \ \left( (L \stackrel{m\_L}{\rightleftarrows} X) \stackrel{p}{\Rightarrow} (R \stackrel{m\_R}{\rightleftarrows} Y) \} $$

If we combine Propositions 16 and 18, we obtain the following corollary that characterizes the co-matches obtained from rewriting a match compatible with <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup>.

**Corollary 19 (Co-match language of the rewritable materialization).** *Let* <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup> *and a production* <sup>p</sup>: <sup>L</sup> <sup>ϕ</sup><sup>L</sup> <sup>I</sup> <sup>ϕ</sup><sup>R</sup> <sup>R</sup> *be given. Assume that* ϕ, ϕ<sup>L</sup> *is obtained as the rewritable materialization of* <sup>ϕ</sup> *and* <sup>ϕ</sup><sup>L</sup> *with match* <sup>L</sup> <sup>n</sup><sup>L</sup> ϕ, ϕ<sup>L</sup> *(see Proposition 14). Furthermore let* (L <sup>n</sup><sup>L</sup> ϕ, ϕ<sup>L</sup> ) <sup>p</sup> <sup>⇒</sup> (<sup>R</sup> <sup>n</sup><sup>R</sup> B)*. Then*

$$\mathcal{L}(R \stackrel{n\_R}{\underset{}{\rightleftarrows}} B) = \{ R \stackrel{m\_R}{\underset{}{\rightleftarrows}} Y \mid \exists (L \stackrel{m\_L}{\underset{}{\rightleftarrows}} X), (X \stackrel{\psi}{\rightarrow} A). \ (\varphi = \psi \circ m\_L \land \varphi)$$

$$(L \stackrel{m\_L}{\underset{}{\rightleftarrows}} X) \stackrel{p}{\Rightarrow} (R \stackrel{m\_R}{\underset{}{\rightleftarrows}} Y))$$

This result does not yet enable us to construct post-conditions for languages of objects. The set of co-matches can be fully characterized as the language of a mono, which can only be achieved by fixing the right-hand side R and thus ensuring that exactly one occurrence of R is represented. However, as soon as we forget about the co-match, this effect is gone and can only be retrieved by adding annotations, which will be introduced next.

# **4 Annotated Objects**

We now endow objects with annotations, thus making object languages more expressive. In particular we will use ordered monoids in order to annotate objects. Similar annotations have already been studied in [20] in the context of type systems and in [10] with the aim of studying decidability and closure properties, but not for abstract rewriting.

**Definition 20 (Ordered monoid).** *An* ordered monoid (M, <sup>+</sup>, <sup>≤</sup>) *consists of a set* <sup>M</sup>*, a partial order* <sup>≤</sup> *and a binary operation* <sup>+</sup> *such that* (M, +) *is a monoid with unit* 0 *(which is the bottom element wrt.* ≤*) and the partial order is compatible with the monoid operation. In particular* <sup>a</sup> <sup>≤</sup> <sup>b</sup> *implies* <sup>a</sup> <sup>+</sup> <sup>c</sup> <sup>≤</sup> <sup>b</sup> <sup>+</sup> <sup>c</sup> *and* <sup>c</sup> <sup>+</sup> <sup>a</sup> <sup>≤</sup> <sup>c</sup> <sup>+</sup> <sup>b</sup> *for all* a, b, c ∈ M*. An ordered monoid is commutative if* <sup>+</sup> *is commutative.*

*A tuple* (M, <sup>+</sup>, <sup>−</sup>, <sup>≤</sup>)*, where* (M, <sup>+</sup>, <sup>≤</sup>) *is an ordered monoid and* <sup>−</sup> *is a binary operation on* M*, is called an* ordered monoid with subtraction*.*

*We say that subtraction is* well-behaved *whenever for all* a, b ∈ M *it holds that* <sup>a</sup> <sup>−</sup> <sup>a</sup> = 0 *and* (<sup>a</sup> <sup>−</sup> <sup>b</sup>) + <sup>b</sup> <sup>=</sup> <sup>a</sup> *whenever* <sup>b</sup> <sup>≤</sup> <sup>a</sup>*.*

For now subtraction is just any operation, without specific requirements. Later we will concentrate on specific subtraction operations and demand that they are well-behaved.

In the following we will consider only commutative monoids.

**Definition 21 (Monotone maps and homomorphisms).** *Let* M1*,* M<sup>2</sup> *be two ordered monoids. A map* <sup>h</sup>: <sup>M</sup><sup>1</sup> → M<sup>2</sup> *is called* monotone *if* <sup>a</sup> <sup>≤</sup> <sup>b</sup> *implies* <sup>h</sup>(a) <sup>≤</sup> <sup>h</sup>(b) *for all* a, b ∈ M1*. The category of ordered monoids with subtraction and monotone maps is called* **Mon***.*

*A monotone map* h *is called a* homomorphism *if* h(0) = 0 *and* h(a + b) = <sup>h</sup>(a) + <sup>h</sup>(b)*. If* <sup>M</sup>1,M<sup>2</sup> *are ordered monoids with subtraction, we say that* <sup>h</sup> *preserves subtraction if* <sup>h</sup>(<sup>a</sup> <sup>−</sup> <sup>b</sup>) = <sup>h</sup>(a) <sup>−</sup> <sup>h</sup>(b)*.*

*Example 22.* Let <sup>n</sup> <sup>∈</sup> <sup>N</sup>\{0} and take <sup>M</sup><sup>n</sup> <sup>=</sup> {0, <sup>1</sup>, . . . , n, ∗} (zero, one, ... , <sup>n</sup>, many) with 0 <sup>≤</sup> <sup>1</sup> ≤ ··· ≤ <sup>n</sup> ≤ ∗ and addition as (commutative) monoid operation with the proviso that <sup>a</sup>+<sup>b</sup> <sup>=</sup> <sup>∗</sup> if the sum is larger than <sup>n</sup>. In addition <sup>a</sup> <sup>+</sup> <sup>∗</sup> <sup>=</sup> <sup>∗</sup> for all <sup>a</sup> ∈ Mn. Subtraction is truncated subtraction where <sup>a</sup> <sup>−</sup> <sup>b</sup> = 0 if <sup>a</sup> <sup>≤</sup> <sup>b</sup>. Furthermore ∗ − <sup>a</sup> <sup>=</sup> <sup>∗</sup> for all <sup>a</sup> <sup>∈</sup> <sup>N</sup>. It is easy to see that subtraction is well-behaved.

Given a set <sup>S</sup> and an ordered monoid (with subtraction) <sup>M</sup>, it is easy to check that also <sup>M</sup><sup>S</sup> is an ordered monoid (with subtraction), where the elements are functions from <sup>S</sup> to <sup>M</sup> and the partial order, the monoidal operation and the subtraction are taken pointwise.

The following path monoid is useful if we want to annotate a graph with information over which paths are present. Note that due to the possible fusion of nodes and edges caused by the abstraction, a path in the abstract graph does not necessarily imply the existence of a corresponding path in a concrete graph. Hence annotations based on such a monoid, which provide information about the existence of paths, can yield useful additional information.

*Example 23.* Given a graph G, we denote by E<sup>+</sup> <sup>G</sup> <sup>⊆</sup> <sup>V</sup>G×V<sup>G</sup> the transitive closure of the edge relation E<sup>→</sup> <sup>G</sup> <sup>=</sup> {(*src*G(e),*tgt*G(e)) <sup>|</sup> <sup>e</sup> <sup>∈</sup> <sup>E</sup>G}. The *path monoid* <sup>P</sup><sup>G</sup> of <sup>G</sup> has the carrier set <sup>P</sup>(E<sup>+</sup> <sup>G</sup>). The partial order is simply inclusion and the monoid operation is defined as follows: given <sup>P</sup>0, P<sup>1</sup> ∈ PG, we have

$$\begin{aligned} P\_0 + P\_1 = \{ (v\_0, v\_n) \mid \exists v\_1, \dots, v\_{n-1} \colon (v\_i, v\_{i+1}) \in P\_{ji}, \\ \quad j\_0 \in \{0, 1\}, j\_{i+1} = 1 - j\_i, i \in \{0, \dots, n-1\} \text{ and } n \in \mathbb{N} \} .\end{aligned}$$

That is, new paths can be formed by concatenating alternating path fragments from P0, P1. It is obvious to see that + is commutative and one can also show associativity. <sup>P</sup> <sup>=</sup> <sup>∅</sup> is the unit. Subtraction simply returns the first parameter: <sup>P</sup><sup>0</sup> <sup>−</sup> <sup>P</sup><sup>1</sup> <sup>=</sup> <sup>P</sup>0.

We will now formally define annotations for objects via a functor from a given category to **Mon**.

**Definition 24 (Annotations for objects).** *Given a category* **C** *and a functor* <sup>A</sup>: **<sup>C</sup>** <sup>→</sup> **Mon***, an* annotation based on <sup>A</sup> *for an object* <sup>X</sup> <sup>∈</sup> **<sup>C</sup>** *is an element* <sup>a</sup> ∈ A(X)*. We write* <sup>A</sup>ϕ*, instead of* <sup>A</sup>(ϕ)*, for the action of functor* <sup>A</sup> *on a* **C***-arrow* ϕ*. We assume that for each object* X *there is a* standard annotation *based on* <sup>A</sup> *that we denote by* <sup>s</sup>X*, thus* <sup>s</sup><sup>X</sup> ∈ A(X)*.*

It can be shown quite straightforwardly that the forgetful functor mapping an annotated object <sup>X</sup>[a], with <sup>a</sup> ∈ A(X), to <sup>X</sup> is an op-fibration (or co-fibration [19]), arising via the Grothendieck construction.

Our first example is an annotation of graphs with global multiplicities, counting nodes and edges, where the action of the functor is to sum up those multiplicities.

*Example 25.* Given <sup>n</sup> <sup>∈</sup> <sup>N</sup>\{0}, we define the functor <sup>B</sup><sup>n</sup> : **Graph** <sup>→</sup> **Mon**: For every graph <sup>G</sup>, <sup>B</sup><sup>n</sup>(G) = <sup>M</sup><sup>V</sup>G∪E<sup>G</sup> <sup>n</sup> . For every graph morphism <sup>ϕ</sup>: <sup>G</sup> <sup>→</sup> <sup>H</sup> and <sup>a</sup> ∈ B<sup>n</sup>(G), we have <sup>B</sup><sup>n</sup> <sup>ϕ</sup>(a) ∈ M<sup>V</sup>H∪E<sup>H</sup> <sup>n</sup> with:

$$\mathcal{B}^n\_\varphi(a)(y) = \sum\_{\varphi(x) = y} a(x), \quad where \ x \in (V\_G \cup E\_G) \text{ and } y \in (V\_H \cup E\_H).$$

Therefore an annotation based on a functor <sup>B</sup><sup>n</sup> associates every item of a graph with a number (or the top value ∗). We will call such annotations *multiplicities*. Furthermore the action of the functor on a morphism transforms a multiplicity by summing up (in Mn) the values of all items of the source graph that are mapped to the same item of the target graph.

For a graph <sup>G</sup>, its *standard multiplicity* <sup>s</sup><sup>G</sup> ∈ Bn(G) is defined as the function which maps every node and edge of G to 1.

As another example we consider local annotations which record the outdegree of a node and where the action of the functor is to take the supremum instead of the sum.

*Example 26.* Given <sup>n</sup> <sup>∈</sup> <sup>N</sup>\{0}, we define the functor <sup>S</sup><sup>n</sup> : **Graph** <sup>→</sup> **Mon** as follows: For every graph <sup>G</sup>, <sup>S</sup><sup>n</sup>(G) = <sup>M</sup><sup>V</sup><sup>G</sup> <sup>n</sup> . For every graph morphism <sup>ϕ</sup>: <sup>G</sup> <sup>→</sup> <sup>H</sup> and <sup>a</sup> ∈ S<sup>n</sup>(G), we have <sup>S</sup><sup>n</sup> <sup>ϕ</sup>(a) ∈ M<sup>V</sup><sup>H</sup> <sup>n</sup> with:

$$\mathcal{S}^n\_\varphi(a)(w) = \bigvee\_{\varphi(v) = w} a(v), \quad where \ v \in V\_G \text{ and } w \in V\_H.$$

For a graph <sup>G</sup>, its *standard annotation* <sup>s</sup><sup>G</sup> ∈ S<sup>n</sup>(G) is defined as the function which maps every node of <sup>G</sup> to its out-degree (or <sup>∗</sup> if the out-degree is larger than n).

Finally, we consider annotations based on the path monoid (see Example 23).

*Example 27.* We define the functor T : **Graph** → **Mon** as follows: For every graph <sup>G</sup>, <sup>T</sup> (G) = <sup>P</sup>G. For every graph morphism <sup>ϕ</sup>: <sup>G</sup> <sup>→</sup> <sup>H</sup> and <sup>P</sup> ∈ T (G), we have <sup>T</sup>ϕ(P) ∈ P<sup>H</sup> with:

$$\mathcal{T}\_{\varphi}(P) = \{ (\varphi(v), \varphi(w)) \mid (v, w) \in P \}.$$

For a graph <sup>G</sup>, its *standard annotation* <sup>s</sup><sup>G</sup> ∈ T (G) is the transitive closure of the edge relation, i.e., s<sup>G</sup> = E<sup>+</sup> G.

In the following we will consider only annotations satisfying certain properties in order to achieve soundness and completeness.

**Definition 28 (Properties of annotations).** *Let* A : **C** → **Mon** *be an annotation functor, together with standard annotations. In this setting we say that*

	- A<sup>ϕ</sup> : <sup>A</sup>(A) → A(B) *has a right adjoint red*<sup>ϕ</sup> : <sup>A</sup>(B) → A(A)*, i.e., red*<sup>ϕ</sup> *is monotone and satisfies* <sup>a</sup> <sup>≤</sup> *red*ϕ(Aϕ(a)) *for* <sup>a</sup> ∈ A(A) *and* <sup>A</sup>ϕ(*red*ϕ(b)) <sup>≤</sup> <sup>b</sup> *for* <sup>b</sup> ∈ A(B)*.* 3

<sup>3</sup> This amounts to saying that the forgetful functor is a bifibration when we restrict to monos, see [19, Lem. 9.1.2].


*Furthermore, assuming that* A<sup>ϕ</sup> *has a right adjoint red*ϕ*, we say that*

*– the* pushout property *holds, whenever for each pushout as shown in the diagram to the right, with all arrows monos where* <sup>η</sup> <sup>=</sup> <sup>ψ</sup><sup>1</sup> ◦ <sup>ϕ</sup><sup>1</sup> <sup>=</sup> <sup>ψ</sup><sup>2</sup> ◦ <sup>ϕ</sup>2*, it holds that for every* <sup>d</sup> <sup>∈</sup> <sup>A</sup>(D)*:*

$$d = \mathcal{A}\_{\psi\_1}(red\_{\psi\_1}(d)) + (\mathcal{A}\_{\psi\_2}(red\_{\psi\_2}(d)) - \mathcal{A}\_{\eta}(red\_{\eta}(d))).$$

*We say that the* pushout property for standard annotations *holds if we replace* d *by* sD*, red* <sup>η</sup>(d) *by* sA*, red*<sup>ψ</sup><sup>1</sup> (d) *by* s<sup>B</sup> *and red*<sup>ψ</sup><sup>2</sup> (d) *by* s<sup>C</sup> *.*

*– the* Beck-Chevalley property *holds if whenever the square shown to the right is a pullback with* ϕ1*,* ψ<sup>2</sup> *mono, then it holds for every* <sup>b</sup> ∈ A(B) *that*

$$\mathcal{A}\_{\varphi\_2}(red\_{\varphi\_1}(b)) = red\_{\psi\_2}(\mathcal{A}\_{\psi\_1}(b)).$$

Note that the annotation functor from Example 25 satisfies all properties above, whereas the functors from Examples 26 and 27 satisfy both the homomorphism property and the pushout property for standard annotations, but do not satisfy all the remaining requirements [8].

We will now introduce a more flexible notion of language, by equipping the abstract objects with two annotations, establishing lower and upper bounds.

**Definition 29 (Doubly annotated object).** *Given a topos* **C** *and a functor* <sup>A</sup>: **<sup>C</sup>** <sup>→</sup> **Mon***, a* doubly annotated object <sup>A</sup>[a1, a2] *is an object* <sup>A</sup> *of* **<sup>C</sup>** *with two annotations* <sup>a</sup>1, a<sup>2</sup> ∈ A(A)*.An arrow* <sup>ϕ</sup>: <sup>A</sup>[a1, a2] <sup>→</sup> <sup>B</sup>[b1, b2]*, also called a* legal arrow*, is a* **<sup>C</sup>***-arrow* <sup>ϕ</sup>: <sup>A</sup> <sup>→</sup> <sup>B</sup> *such that* <sup>A</sup>ϕ(a1) <sup>≥</sup> <sup>b</sup><sup>1</sup> *and* <sup>A</sup>ϕ(a2) <sup>≤</sup> <sup>b</sup>2*.*

*The* language of a doubly annotated object A[a1, a2] *(also called the language of objects which are abstracted by* A[a1, a2]*) is defined as follows:*

$$\mathcal{L}(A[a\_1, a\_2]) = \{ X \in \mathbf{C} \mid \text{there exists a legal arrow } \varphi \colon X[s\_X, s\_X] \to A[a\_1, a\_2] \}$$

Note that legal arrows are closed under composition [9]. Examples of doubly annotated objects are given in Example 36 for global annotations from Example 25 (providing upper and lower bounds for the number of nodes resp. edges in the preimage of a given element). Graph elements without annotation are annotated by [0, <sup>∗</sup>] by default.

**Definition 30 (Isomorphism property).** *An annotation functor* A: **C** → **Mon***, together with standard annotations, satisfies the* isomorphism property *if the following holds: whenever* <sup>ϕ</sup>: <sup>X</sup>[sX, sX] <sup>→</sup> <sup>Y</sup> [s<sup>Y</sup> , s<sup>Y</sup> ] *is legal, then* <sup>ϕ</sup> *is an isomorphism, i.e.,* <sup>L</sup>(<sup>Y</sup> [s<sup>Y</sup> , s<sup>Y</sup> ]) *contains only* <sup>Y</sup> *itself (and objects isomorphic to* Y *).*

# **5 Abstract Rewriting of Annotated Objects**

We will now show how to actually rewrite annotated objects. The challenge is both to find suitable annotations for the materialization and to "rewrite" the annotations.

#### **5.1 Abstract Rewriting and Soundness**

We first describe how the annotated rewritable materialization is constructed and then we investigate its properties.

**Definition 31 (Construction of annotated rewritable materialization).** *Let* p: L <sup>ϕ</sup><sup>L</sup> <sup>I</sup> <sup>ϕ</sup><sup>R</sup> R *be a production and let* A[a1, a2] *be a doubly annotated object. Furthermore let* <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup> *be an arrow.*

*We first construct the factorization* L <sup>n</sup><sup>L</sup> ϕ, ϕ<sup>L</sup> <sup>ψ</sup> <sup>→</sup> <sup>A</sup>*, obtaining the rewritable materialization* ϕ, ϕ<sup>L</sup> *from Definition 13. Next, let* <sup>M</sup> *contain all maximal*<sup>4</sup> *elements of the set*

$$\mathcal{A}\left\{(a\_1',a\_2')\in\mathcal{A}(\langle\langle\varphi,\varphi\_L\rangle\rangle)^2\mid\mathcal{A}\_{n\_L}(s\_L)\le a\_2',a\_1\le\mathcal{A}\_{\psi}(a\_1'),\mathcal{A}\_{\psi}(a\_2')\le a\_2\right\}.$$

*Then the doubly annotated objects* ϕ, ϕ<sup>L</sup> [a 1, a <sup>2</sup>] *with* (a 1, a <sup>2</sup>) <sup>∈</sup> <sup>M</sup> *are the annotated rewritable materializations for* A[a1, a2]*,* ϕ *and* ϕL*.*

Note that in general there can be several such materializations, differing by the annotations only, or possibly none. The definition of M ensures that the upper bound a <sup>2</sup> of the materialization covers the annotations arising from the left-hand side. We cannot use a corresponding condition for the lower bound, since the materialization might contain additional structures, hence the arrow n<sup>L</sup> is only "semi-legal". A more symmetric condition will be studied in Sect. 5.2.

**Proposition 32 (Annotated rewritable materialization is terminal).** *Given a production* p: L <sup>ϕ</sup><sup>L</sup> <sup>I</sup> <sup>ϕ</sup><sup>R</sup> <sup>R</sup>*, let* <sup>L</sup> <sup>m</sup><sup>L</sup> X *be the match of* L *in an object* X *such that* X p,m<sup>L</sup> <sup>=</sup>⇒*, i.e.,* <sup>X</sup> *can be rewritten. Assume that* <sup>X</sup> *is abstracted by* <sup>A</sup>[a1, a2]*, witnessed by* <sup>ψ</sup>*. Let* <sup>ϕ</sup> <sup>=</sup> <sup>ψ</sup> ◦ <sup>m</sup><sup>L</sup> *and let* <sup>L</sup> <sup>n</sup><sup>L</sup> ϕ, ϕ<sup>L</sup> <sup>ψ</sup>- <sup>→</sup> <sup>A</sup> *the the corresponding rewritable materialization. Then there exists an arrow* ζ<sup>A</sup> *and a pair of annotations* (a 1, a <sup>2</sup>) <sup>∈</sup> <sup>M</sup> *for* ϕ, ϕ<sup>L</sup> *(as described in Definition 31) such that the diagram below commutes and the square is a pullback in the underlying category. Furthermore the triangle consists of legal arrows. This means in particular that* ζ<sup>A</sup> *is legal.*

$$\begin{aligned} L[s\_L, s\_L] &\xleftarrow{m\_L} X[s\_X, s\_X] &\xrightarrow{\psi} A[a\_1, a\_2] \\ \stackrel{id\_L}{\downarrow} & (\text{PB}) & \downarrow\_A \\ L[s\_L, s\_L] & \xrightarrow{\quad n\_L} \langle \langle \varphi, \varphi\_L \rangle \rangle [a\_1', a\_2'] \end{aligned}$$

<sup>4</sup> "Maximal" means maximality with respect to the interval order (a1, a2) (a- 1, a- <sup>2</sup>) ⇐⇒ a- <sup>1</sup> ≤ a1, a<sup>2</sup> ≤ a- 2.

Having performed the materialization, we will now show how to rewrite annotated objects. Note that we cannot simply take pushouts in the category of annotated objects and legal arrows, since this would result in taking the supremum of annotations, when instead we need the sum (subtracting the annotation of the interface I, analogous to the inclusion-exclusion principle).

**Definition 33 (Abstract rewriting step).** *Let* <sup>p</sup>: <sup>L</sup> <sup>ϕ</sup><sup>L</sup> <sup>I</sup> <sup>ϕ</sup><sup>R</sup> R *be a production and let* A[a1, a2] *be an annotated abstract object. Furthermore let* <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup> *be a match of a left-hand side, let* <sup>n</sup><sup>L</sup> : <sup>L</sup> ϕ, ϕ<sup>L</sup> *be the match obtained via materialization and let* (a 1, a <sup>2</sup>) <sup>∈</sup> <sup>M</sup> *(as in Definition 31).*

*Then* A[a1, a2] *can be transformed to* B[b1, b2] *via* p *if there are arrows such that the two squares below are pushouts in the base category and* b1, b<sup>2</sup> *are defined as:*

$$b\_i = \mathcal{A}\_{\varphi\_B}(c\_i) + (\mathcal{A}\_{n\_R}(s\_R) - \mathcal{A}\_{n\_R \diamond \varphi\_R}(s\_I)) \qquad for \ i \in \{1, 2\}$$

*where* c1, c<sup>2</sup> *are maximal annotations such that:*

$$a\_1' \le \mathcal{A}\_{\varphi\_A}(c\_1) + (\mathcal{A}\_{n\_L}(s\_L) - \mathcal{A}\_{n\_L \circ \varphi\_L}(s\_I)) \quad \mathcal{A}\_{\varphi\_A}(c\_2) + (\mathcal{A}\_{n\_L}(s\_L) - \mathcal{A}\_{n\_L \circ \varphi\_L}(s\_I)) \le a\_2'$$

$$\begin{aligned} L[s\_L, s\_L] &\xleftarrow{\varphi\_L} & I[s\_I, s\_I] &\xrightarrow{\varphi\_R} R[s\_R, s\_R] \\ \int\_{\substack{n\_L\\ \langle\varphi, \varphi\_L\rangle\nmid [a\_1', a\_2']}} & \int\_{\substack{n\_I\\ \varphi\_R \end{pmatrix}} & \int\_{\substack{\varphi\_R\\ \varphi\_R \ni}} B[b\_1, b\_2] \end{aligned}$$

*In this case we write* A[a1, a2] p,ϕ B[b1, b2] *and say that* A[a1, a2] *makes an* abstract rewriting step *to* B[b1, b2]*.*

We will now show soundness of abstract rewriting, i.e., whenever an object X is abstracted by A[a1, a2] and X is rewritten to Y , then there exists an abstract rewriting step from A[a1, a2] to B[b1, b2] such that Y is abstracted by B[b1, b2].

*Assumption:* In the following we will require that the homomorphism property as well as the pushout property for standard annotations hold (cf. Definition 28).

**Proposition 34 (Soundness for ).** *Relation is sound in the following sense: Let* <sup>X</sup> ∈ L(A[a1, a2]) *(witnessed via a legal arrow* <sup>ψ</sup>: <sup>X</sup>[sX, sX] <sup>→</sup> A[a1, a2]*) where* X p,m<sup>L</sup> <sup>=</sup><sup>⇒</sup> <sup>Y</sup> *. Then there exists an abstract rewriting step* A[a1, a2] p,ψ◦m<sup>L</sup> <sup>B</sup>[b1, b2] *such that* <sup>Y</sup> ∈ L(B[b1, b2])*.*

#### **5.2 Completeness**

The conditions on the annotations that we imposed so far are too weak to guarantee completeness, that is the fact that every object represented by B[b1, b2] can be obtained by rewriting an object represented by A[a1, a2]. This can be clearly seen by the fact that the requirements hold also for the singleton monoid and, as discussed before, the graph structure of B is insufficient to characterize the successor objects or graphs.

Hence we will now strengthen our requirements in order to obtain completeness.

*Assumption:* In addition to the assumptions of Sect. 5.1, we will need that subtraction is well-behaved and that the adjunction property, the pushout property, the Beck-Chevalley property (Definition 28) and the isomorphism property (Definition 30) hold.

The global annotations from Example 25 satisfy all these properties. In particular, given an injective graph morphism ϕ: G H the right adjoint *red*<sup>ϕ</sup> : <sup>M</sup><sup>V</sup>H∪E<sup>H</sup> <sup>n</sup> → M<sup>V</sup>G∪E<sup>G</sup> <sup>n</sup> to <sup>B</sup><sup>n</sup> <sup>ϕ</sup> is defined as follows: given an annotation <sup>b</sup> ∈ M<sup>V</sup>H∪E<sup>H</sup> <sup>n</sup> , *red*ϕ(b)(x) = b(ϕ(x)), i.e., *red*<sup>ϕ</sup> simply provides a form of reindexing.

We will now modify the abstract rewriting relation and allow only those abstract annotations for the materialization that reduce to the standard annotation of the left-hand side.

**Definition 35 (Abstract rewriting step** →**).** *Given* <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup>*, assume that* B[b1, b2] *is constructed from* A[a1, a2] *via the construction described in Definitions 31 and 33, with the modification that the set of annotations from which the set of maximal annotations* <sup>M</sup> *of the materialization* ϕ, ϕ<sup>L</sup> *are taken, is replaced by:*

{(a 1, a <sup>2</sup>) ∈ A(ϕ, ϕ<sup>L</sup> ) <sup>2</sup> <sup>|</sup> *red* <sup>n</sup><sup>L</sup> (a <sup>i</sup>) = <sup>s</sup>L, i ∈ {1, <sup>2</sup>}, a<sup>1</sup> ≤ Aψ(a <sup>1</sup>), <sup>A</sup>ψ(a <sup>2</sup>) <sup>≤</sup> <sup>a</sup>2}.

*In this case we write* A[a1, a2] p,ϕ <sup>→</sup> <sup>B</sup>[b1, b2]*.*

Due to the adjunction property we have <sup>A</sup><sup>n</sup><sup>L</sup> (sL) = <sup>A</sup><sup>n</sup><sup>L</sup> (*red* <sup>n</sup><sup>L</sup> (a <sup>2</sup>)) <sup>≤</sup> <sup>a</sup> <sup>2</sup> and hence the set M of annotations of Definition 35 is a subset of the corresponding set of Definition 33.

*Example 36.* We give a small example of an abstract rewriting step (a more extensive, worked example can be found in the full version [8]). Elements without annotation are annotated by [0, <sup>∗</sup>] by default and those with annotation [0, 0] are omitted. Furthermore elements in the image of the match and co-match are annotated by the standard annotation [1, 1] to specify the concrete occurrence of the left-hand and right-hand side.

<sup>A</sup> <sup>←</sup> <sup>L</sup> <sup>I</sup> <sup>R</sup>

The variant of abstract rewriting introduced in Definition 35 can still be proven to be sound, assuming the extra requirements stated above.

**Proposition 37 (Soundness for** →**).** *Relation* <sup>→</sup> *is sound in the sense of Proposition 34.*

Using the assumptions we can now show completeness.

**Proposition 38 (Completeness for** →**).** *If* <sup>A</sup>[a1, a2] p,ϕ <sup>→</sup> <sup>B</sup>[b1, b2] *and* <sup>Y</sup> <sup>∈</sup> <sup>L</sup>(B[b1, b2])*, then there exists* <sup>X</sup> ∈ L(A[a1, a2]) *(witnessed via a legal arrow* <sup>ψ</sup>: <sup>X</sup>[sX, sX] <sup>→</sup> <sup>A</sup>[a1, a2]*) such that* <sup>X</sup> p,m<sup>L</sup> <sup>=</sup><sup>⇒</sup> <sup>Y</sup> *and* <sup>ϕ</sup> <sup>=</sup> <sup>ψ</sup> ◦ <sup>m</sup>L*.*

Finally, we can show that annotated graphs of this kind are expressive enough to construct a strongest post-condition. If we would allow several annotations for objects, as in [9], we could represent the language with a single (multiply) annotated object.

**Corollary 39 (Strongest post-condition).** *Let* A[a1, a2] *be an annotated object and let* <sup>ϕ</sup>: <sup>L</sup> <sup>→</sup> <sup>A</sup>*. We obtain (several) abstract rewriting steps* A[a1, a2] p,ϕ <sup>→</sup> <sup>B</sup>[b1, b2]*, where we always obtain the same object* <sup>B</sup>*. (*<sup>B</sup> *is dependent on* <sup>ϕ</sup>*, but not on the annotation.) Now let* <sup>N</sup> <sup>=</sup> {(b1, b2) <sup>|</sup> <sup>A</sup>[a1, a2] p,ϕ <sup>→</sup> <sup>B</sup>[b1, b2]}*. Then*

$$\bigcup\_{(b\_1, b\_2) \in N} \mathcal{L}(B[b\_1, b\_2]) = \{ Y \mid \exists (X \in \mathcal{L}(A[a\_1, a\_2]), \text{witnessed by } \psi), (L \overset{m\_L}{\longmapsto} X) . \}$$

$$(\varphi = \psi \circ m\_L \land X \overset{p, m\_L}{\Longrightarrow} Y)$$

#### **6 Conclusion**

We have described a rewriting framework for abstract graphs that also applies to objects in any topos, based on existing work for graphs [1,2,4,27,28,31]. In particular, we have given a blueprint for materialization in terms of the universal property of partial map classifiers. This is a first theoretical milestone towards shape analysis as a general static analysis method for rule-based systems with graph-like objects as states. Soundness and completeness results for the rewriting of abstract objects with annotations in an ordered monoid provide an effective verification method for the special case of graphs We plan to implement the materialization construction and the computation of rewriting steps of abstract graphs in a prototype tool.

The extension of annotations with logical formulas is the natural next step, which will lead to a more flexible and versatile specification language, as described in previous work [30,31]. The logic can possibly be developed in full generality using the framework of nested application conditions [18,23] that applies to objects in adhesive categories. This logical approach might even reduce the proof obligations for annotation functors. Another topic for future work is the integration of widening or similar approximation techniques, which collapse abstract objects and ideally lead to finite abstract transition systems that (over-)approximate the typically infinite transitions systems of graph transformation systems.

# **References**


G., Taentzer, G. (eds.) Formal Methods in Software and Systems Modeling. LNCS, vol. 3393, pp. 293–308. Springer, Heidelberg (2005). https://doi.org/10.1007/978- 3-540-31847-7 17


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Two-Way Parikh Automata with a Visibly Pushdown Stack**

Luc Dartois1(B) , Emmanuel Filiot<sup>2</sup>, and Jean-Marc Talbot<sup>3</sup>

<sup>1</sup> LACL-Universit´e Paris-Est Cr´eteil, Cr´eteil, France ldartois@lacl.fr

<sup>2</sup> Universit´e Libre de Bruxelles, Brussels, Belgium

<sup>3</sup> LIM-Aix-Marseille Universit´e, Marseille, France

**Abstract.** In this paper, we investigate the complexity of the emptiness problem for Parikh automata equipped with a pushdown stack. Pushdown Parikh automata extend pushdown automata with counters which can only be incremented and an acceptance condition given as a semilinear set, which we represent as an existential Presburger formula over the final values of the counters. We show that the non-emptiness problem both in the deterministic and non-deterministic cases is NP-c. If the input head can move in a two-way fashion, emptiness gets undecidable, even if the pushdown stack is visibly and the automaton deterministic. We define a restriction, called the single-use restriction, to recover decidability in the presence of two-wayness, when the stack is visibly. This syntactic restriction enforces that any transition which increments at least one dimension is triggered only a bounded number of times per input position. Our main contribution is to show that non-emptiness of twoway visibly Parikh automata which are single-use is NExpTime-c. We finally give applications to decision problems for expressive transducer models from nested words to words, including the equivalence problem.

#### **1 Introduction**

*Parikh automata.* Since the classical automata-based approach to modelchecking [28], finite automata have been extended in many ways to tackle the automatic verification of more realistic and powerful systems against more expressive specifications. For instance, they have been extended to pushdown systems [3,26,30], concurrent systems [5], and systems with counters or specifications with arithmetic constraints have been the focus of many works in verification [7,11,15–18,23].

Along this line of work, Parikh automata (or PA), introduced in [22], are an important instance of automata extension with arithmetic constraints. They are automata on finite words whose transitions are equipped with counter operations. The counters can only be incremented, and do not influence the run (enabling a transition requires no test on counter values), but the acceptance of a run is defined by the membership of the final counter valuations to some semi-linear set S. Expressivity of PAs goes beyond regularity, as the language L = {w | |w|<sup>a</sup> = |w|b} of words having the same numbers of as and bs is realised by a simple automaton counting the numbers of as and bs in counters x<sup>1</sup> and x<sup>2</sup> respectively, and the accepting condition is given by the linear-set {(i, i) | i ∈ **N**}. Semi-linear sets can be defined by formulas in existential Presburger arithmetic, ie first-order formulas with equality and sum predicates over integers, whose free variables are evaluated by the counter values calculated by the run.

A central problem in automata theory is the non-emptiness problem: does the automaton accepts at least one input. Although PAs go beyond regular languages, they retain relatively good algorithmic properties. The emptiness problem is decidable, and it is NP-c [12]. The hardness holds even if the semi-linear set is represented as a set of generator vectors. Motivated by applications in transducer theory for well-nested words, we investigate in this article extensions of Parikh automata with a pushdown stack.

*First contribution: pushdown Parikh automata.* As a first contribution, we study the complexity of the emptiness problem for Parikh automata with a pushdown store. Parikh automata extend finite automata with counter operations and an acceptance condition given as a semi-linear set, *pushdown Parikh automata* extend pushdown automata in the same way. We show that adding a stack can be done for free with respect to the emptiness problem, which remains, as for stackfree Parikh automata, NP-c. However in this case, we are able to strengthen the lower bound: it remains NP-hard even if there are only two counters, the automaton is deterministic, and the Presburger formula only tests for equality of these two counters. In the stack-free setting, it is necessary to have an unfixed number of counters to get such a lower bound.

**Contribution 1.** The emptiness problem for pushdown Parikh automata (PPA) is NP-c. The lower bound holds even if the automaton is deterministic, has only two counters whose operations are encoded in unary, and they are eventually tested for equality.

*Second contribution: adding two-wayness.* We investigate the complexity of pushdown Parikh automata when the input head is allowed to move in two directions. It is not difficult to see that in that case emptiness gets undecidable, since, already without counters, one can simulate the intersection of two deterministic pushdown automata, by performing two passes over the input (visiting each input position at most three times). We consider a first restriction on the stack behaviour, which is required to be *visibly*.

A pushdown stack is called visibly if it is driven by the type of letters it reads, which can be either call symbols, return symbols or internal symbols. Words formed over such a structured alphabet are called nested words, and well-nested words if additionally the call/return structure of the word is well-balanced, such as in the following example:

$$\bigcup\_{c}^{c} \bigcup\_{c}^{r} \stackrel{r}{\smile} r \stackrel{c}{\smile} r$$

Automata for nested words, called *visibly pushdown automata* (or VPA), have been introduced in [2]. They are pushdown automata whose stack behaviour is constrained by the input in the following way. Upon reading a call symbol, exactly one symbol is pushed onto the stack. Upon reading a return symbol, exactly one symbol is popped from it. Upon reading an internal symbol, the stack is left unchanged. Hence, the symbol that is pushed while reading a given call symbol is popped while reading its matching return symbol. Consequently, visibly pushdown automata enjoy nice properties, such as closure under Boolean operations and determinisation.

VPA have been extended to two-way VPA (2VPA) [8] with the following stack constraints: in a backward reading mode, the role of the return and call symbols regarding the stack are inverted: when reading a call, exactly one symbol is popped from the stack and when reading a return, one symbol is pushed. It was shown in [8] that adding this visibly condition to two-way pushdown automata allows one to recover decidability for the emptiness problem. However, for Parikh acceptance, this restriction is not sufficient. Indeed, by encoding diophantine equations, we show the following undecidability result:

**Contribution 2.** The emptiness problem for two-way visibly pushdown Parikh automata (2VPPA) is undecidable.

*Single-use property.* The problem is that by using the combination of twowayness and a pushdown stack, it is possible to encode polynomially, and even exponentially large counter values, with respect to the length of the input word. We consider therefore the single-use restriction, which appears in several transducer models [6,8,10], by which it is possible to keep a linear behaviour for the counters. Informally, a *single-use* two-way machine bounds the size of the production per input positions. It is syntactically enforced by asking that transitions which strictly increment at least one counter are triggered at most once per input position. Our main result is the decidability of 2VPPA emptiness under the single-use restriction, with tight complexity.

**Contribution 3 (Main).** The emptiness problem for two-way single-use visibly pushdown Parikh automata (2VPPAsu) is NExpTime-c. The hardness holds even if the automaton is deterministic, has only two counters whose operations are encoded in unary, and they are eventually tested for equality.

To prove the upper-bound, we show that two-wayness can be removed from single-use 2VPPA, at the price of one exponential. In other words, single-use 2VPPA and VPPA have the same expressive power, although it can be shown that the former model is exponentially more succinct. The lower bound is obtained by encoding the succinct variant of the subset sum problem, based on a reduction which uses the fact that, by combining the pushdown and two-way features, single-use 2VPPA can encode doubly-exponential values 2<sup>2</sup><sup>n</sup> with a polynomial number of states (in n).


**Fig. 1.** Complexity of the emptiness of different Pushdown Parikh Automata. All results hold for deterministic and non-deterministic machines.

**Contribution 4 (Applications).** As an application, we give an elementary upper-bound (NExpTime) for the equivalence problem of functional single-use two-way visibly pushdown transducers [8], while an ExpTime lower bound was known. This transducer model defines transductions from well-nested words to words and, as shown in [8], they are well-suited to define XML transformations, have the same expressive power as Courcelle's MSO-transducers [6] (casted to well-nested words), and admit a memory-efficient evaluation algorithm. We also provide two other new results on single-use 2VPT (not necessarily functional). First, we show that given a positive integer k, it is decidable whether a singleuse 2VPT produces at most k different output words per input (k-valuedness problem). Then, we show the decidability of a typechecking problem: given a single-use 2VPT T and a finite (stack-free) Parikh automaton P, it is decidable whether the codomain of T has a non-empty intersection with P. This allows for instance to decide whether a single-use 2VPT produces only well-nested words and thus describes a well-nested words to well-nested words transformation, since the property of a word to be non well-nested is definable, as we show, by a Parikh automaton.

*Finite-visit vs single-useness.* The single-use property is more general than the more classical *finite-visit* restriction, used for instance in [9,19]: it requires to visit any input position a (machine-dependent) constant number of times, while single-useness only bounds the number of visits by producing transitions. Although, consequently to our results, 2VPPA single-use and finite-visit have the same expressive power, this extra modelling feature is desirable, for instance when using 2VPPA to test properties of 2VPT: single-use 2VPT are strictly more expressive than finite-visit ones, and this relaxation is crucial to capture MSO transductions [8]. Moreover, we somehow get it for free: we show that the NExpTime lower bound also holds for finite-visit 2VPPA. Finally, we note that as we deal with single-use machines rather than finite-visit ones, the usual ingredient for going from two-way to one-way consisting of memorizing simply crossing sections of states, is not sufficient to get the result here, since we cannot bound the size of these crossing sections.

*Related work.* Parikh automata are closely related to reversal-bounded counter machines [18]. In fact, both models have equivalent expressiveness in the nondeterministic case [22]. The difference of expressive power in the deterministic case is due to the fact that counter machines can perform tests on its counters that can influence the run, while counters in Parikh automata only matter at the end of the run. Several extensions of reversal-bounded counter machines were studied, whether they are two-way or equipped with a (visibly) pushdown stack. However, to the best of our knowledge, the combination of the two features has never been studied (see [19] for a survey). It is possible to define a model of single-use reversal-bounded two-way visibly pushdown counter machines, where the single-useness is put on transitions that modify the counters. This model is expressively equivalent to 2VPPAsu in the non-determinstic case, and thanks to our result, has a decidable emptiness problem. The non-emptiness problem for reversal-bounded (one-way) pushdown counter machines for fixed numbers of counters and reversals is known to be in NP [13] and NP-hard [16]. Converting PPA into reversal-bounded counter machines would yield an unfixed number of counters. Our NP lower-bound for PPA however follows ideas of [16] about encoding, using the stack, integers n with O(log(n)) states and stack symbols.

Two-way (stack-free) reversal-bounded counter machines, even deterministic, are known to have undecidable emptiness problem [19]. Decidability is recovered by taking the finite-visit restriction [19]. Our result on 2VPPAsu entails the decidability of emptiness of two-way reversal-bounded counter machines which are single-use.

Finally, all the decidability results we prove on two-way visibly pushdown transducers were already known in the one-way case [13]. Two-way visibly pushdown transducers, which are strictly more expressive, can also be seen as a model of unranked tree-to-word transducers, modulo tree linearisation. To the best of our knowledge, this is the first model of unranked tree-to-word transducers for which k-valuedness and codomain well-nestedness is shown to be decidable. Another model, introduced in [1], is known to be expressively equivalent to 2VPTsu [8], and in the functional case, has decidable equivalence problem in NExpTime. However, translating 2VPTsu to this model requires an exponential blow-up, yielding a worst complexity for equivalence testing.

*Structure.* Section 2 introduces the computing models used, the proof of the lower bound for 2VPPAsu is given in Sect. 3 and the upper bound in Sect. 4. Finally, some applications to the main theorem to transducers are given in Sect. 5.

### **2 Two-Way Visibly Pushdown (Parikh) Automata**

In this section, we first recall the definition of two-way visibly pushdown automata and later on extend them to two-way visibly pushdown Parikh automata.

We consider a structured alphabet Σ defined as the disjoint union of call symbols Σc, return symbols Σ<sup>r</sup> and internal symbols Σi. The set of words over Σ is Σ∗. As usual, denotes the empty word. Amongst nested words, the set of well-nested words Σ<sup>∗</sup> wn is defined as the least set such that Σ<sup>i</sup> ∪ {} is included into Σ<sup>∗</sup> wn and if w1, w<sup>2</sup> ∈ Σ<sup>∗</sup> wn then both w1w<sup>2</sup> and cw1r (for all c ∈ Σ<sup>c</sup> and r ∈ Σr) belong to Σ<sup>∗</sup> wn.

When dealing with two-way machines, we assume the structured alphabet Σ to be extended to Σ by adding a left and right marker symbols , in Σ<sup>c</sup> and Σ<sup>r</sup> respectively, and we consider words in the language Σ∗.

**Definition 1.** *A* two way visibly pushdown automaton *(*2VPA *for short)* A *over* Σ *is given by* (Q, q<sup>I</sup> , F, Γ, δ) *where* Q *is a finite set of states,* q<sup>I</sup> ∈ Q *is the initial state,* F ⊆ Q *is a set of final states and* Γ *is a finite stack alphabet. Given the set* **<sup>D</sup>** <sup>=</sup> {←,→} *of directions, the transition relation* <sup>δ</sup> *is defined by* <sup>δ</sup>*push*∪δ*pop*∪δ*int where*


*Additionally, we require that for any states* q, q *and any stack symbol* γ*, if* (q,←, , γ, q , d) <sup>∈</sup> <sup>δ</sup>*pop then* <sup>d</sup> <sup>=</sup><sup>→</sup> *and if* (q,→, , γ, q , d) <sup>∈</sup> <sup>δ</sup>*pop then* <sup>d</sup> <sup>=</sup><sup>←</sup> *ensuring that the reading head stays within the bounds of the input word.*

Informally, a 2VPA has a reading head pointing between symbols (and possibly on the left of or the right of ). A configuration of the machine is given by a state, a direction d and a stack content. The next symbol to be read is on the right of the head if d =→ and on the left if d =←. Note that when reading the left marker from right to left ← (resp. the right marker from left to right →), the next direction can only be → (resp. ←). The structure of the alphabet induces the behavior of the machine regarding the stack when reading the input word: when reading on the right, a call symbol leads to push one symbol onto the stack while a return symbol pops one symbol from the stack. When reading on the left, a dual behaviour holds. In any direction internal transitions from δint read internal symbols and do not affect the stack; hence, at a given position in the input word, the height of the stack is always constant at each visit of that position in the run of the machine. The triggering of a transition leads to the update of the state of the machine, the future direction as well as the stack content. For a direction d, a natural i (0 ≤ i ≤ |w|) and a word w, we denote by


Note that when switching directions (i.e. when the direction of the first part of the transition is different from the second part), we read twice the same letter. This ensures the good behavior of the stack, as reading a call letter from left to right pushes a stack symbol, we need to pop it if we start moving from right to left.

Formally, a stack σ is a finite word over Γ. The empty stack/word over Γ is denoted ⊥. For a word w from Σ and a 2VPA A = (Q, q<sup>I</sup> , F, Γ, δ), a *configuration* κ of A is a tuple (q, i, d, σ) where q ∈ Q, 0 ≤ i ≤ |w|, d ∈ **D** and σ is a stack. A *run* of A on a word w is a finite sequence ρ from K(δK)∗, where K is the set of all configurations κ (that is a sequence starting and ending with a configuration and alternating between configurations and transitions); a run ρ is of the form (q0, i0, d0, σ0)τ1(q1, i1, d1, σ1)τ<sup>2</sup> ...τ-(q-, i-, d-, σ-) where for all 0 ≤ j< , we have:


Note that any configuration is actually a run on the empty word . The initial configuration is (q<sup>I</sup> , 0,→, ⊥). A configuration (q, i, d, ⊥) is *final* if q ∈ F and i is the last position. A run for the word w is accepting if its first configuration is initial and its last configuration is final. A two-way visibly pushdown automaton A is:


The size of a 2VPA is the number of states times the size of the stack alphabet. For A an automaton, we denote by L(A) the language recognized by A.

**Lemma 1 (**[8]**).** *Given a* 2VPA A*, deciding if* L(A) *is empty is* ExpTimecomplete*.*

*Parikh automata.* Parikh automata were introduced in [22]. Informally, they are automata with counters that can only be incremented, and do not act on the transition relation. Acceptance of runs is done by evaluating a Presburger formula whose free variables are set to the counter values. In our setting, a *Presburger formula* is a positive formula ψ(x1,...,xn) = ∃y<sup>1</sup> ...ymϕ(x1,...,xn, y1,...,ym) such that ϕ is a boolean combination of atoms s + s ≤ t + t , for s, s , t, t ∈ {0, 1, x1,...,xn, y1,...,ym}. For a set S and some positive number m, we denote by S<sup>m</sup> the set of all mappings from [1 ...m] to S. If (s1,...,sm) and (t1,...,tm) are two tuples of S<sup>m</sup> and + is an binary operation on S, we extend + to S<sup>m</sup> by considering the operation element-wise, i.e. (s1,...,sm)+(t1,...,tm)=(s<sup>1</sup> + t1,...,s<sup>m</sup> + tm).

**Definition 2.** *A* two-way visibly pushdown Parikh automaton *(*2VPPA *for short) is a tuple* P = (A, λ, φ) *where* A *is a* 2VPA *and for some natural dim ,* λ *is a mapping from* δ *to* **N***dim, the set of vectors of length dim of naturals and* φ(x1,...,x*dim* ) *is a Presburger formula with dim free variables.*

When clear from context, we may omit the free variables from the Presburger formula, and simply note φ. A run of a 2VPPA is a run of its underlying 2VPA. We extend canonically the mapping λ to runs. For a run ρ of the form (q0, i0, d0, σ0)τ1(q1, i1, d1, σ1)τ<sup>2</sup> ...τ-(q-, i-, d-, σ-), we set

$$
\lambda(\rho) = \lambda(\tau\_1) + \lambda(\tau\_2) + \dots + \lambda(\tau\_\ell),
$$

We recall that a single configuration c is a run over the empty word . For such a run c, we set λ(c)=0*dim* . A run (q0, i0, d0, σ0)τ1(q1, i1, d1, σ1) τ<sup>2</sup> ...τ-(q-, i-, d-, σ-) is accepted if (q0, i0, d0, σ0), (q-, i-, d-, σ-) are respectively an initial and a final configuration of the underlying automaton and for λ(ρ)=(n1,...,n*dim* ), [x<sup>1</sup> ← n1,...,x- ← n*dim* ] |= φ(x1,...,x*dim* ). The language L(P) is the set of words which admit an accepting run. We define the set of values computed by P as *Val*(P) = {λ(ρ) | ρ a valid run of the underlying automaton of P}. We define the size of P as the size of A plus the number of symbols in φ and |δ| · dim · log(W) where W is the maximal value occurring in the codomain of λ.

It is deterministic (resp. one-way), denoted D2VPPA (resp. VPPA) if its underlying automaton is deterministic (resp. one-way). It is known from [4] that DPA (i.e. deterministic one-way and stack-free Parikh automata in our setting) are strictly less expressive than their nondeterministic counterpart. As a counter example, they exhibit the language L = {w | w#a(w) = b}, ie all words w such that if n is the number of a in w, the letter at the nth position is a b. Note that even in the two-way case, a deterministic machine recognizing L needs to either have access, during the computation, to the number of a's, or be able to store, in counters, the position of each b. As the first solution cannot be done since Parikh automata only access their counters at the end of the run, and the second is also impossible since there are only a finite number of counters, this language is also non definable by a D2VPPA, furthering the separation between deterministic and nondeterministic Parikh automata.

*Example 1.* As an example, we give a deterministic 2VPPA P that, given an input i <sup>n</sup>c<sup>k</sup>i r<sup>k</sup> with c, i, r in Σc, Σ<sup>i</sup> and Σ<sup>r</sup> respectively, accepts if k = and n = k<sup>2</sup>. The 2VPPA P uses 4 variables xn, xk, x and y. The first 3 variables are used to count the number of the first block of is, the number of calls and the second block of is respectively. The handling of these 3 variables is straightforward and can be done in a single pass over the input. The fourth variables y counts the multiplication k · and doing so is more involved. The part of the underlying 2VPA of P handling y is given in Fig. 2. On this part, the mapping λ simply increments the counter on transitions going to state 2 (i.e. on reading the letters i from left to right). It makes as many passes on the set of internal symbols in state 2 as there are call symbols, and the state of the stack upon reading i for the jth time is 1<sup>j</sup>0<sup>k</sup>−<sup>j</sup> . Finally, the accepting formula φ of P is defined by x<sup>n</sup> = y ∧ x<sup>k</sup> = x-. Note that this widget allows us to compute the set {(k<sup>2</sup>, k, k, k<sup>2</sup>) <sup>|</sup> <sup>k</sup> <sup>∈</sup> **<sup>N</sup>**} which is not semilinear.

**Fig. 2.** A 2VPPA reading words c<sup>k</sup>i - r<sup>k</sup> and making k passes on i - , adding k *·* to the variable y. The transitions have two components, the first being the letter read, and the second being the stack operation. There is no stack operation upon reading internal symbols. The variable y is incremented in transitions going to state 2 only.

As we have seen in the previous example, the set *Val*(P) is not necessarily semi-linear, even with P a D2VPPA. We use this fact to encode diophantine equations, and get the following undecidability result:

#### **Theorem 1.** *The emptiness problem of* D2VPPA *is undecidable.*

*Single-useness.* In order to recover decidability, we adapt to Parikh Automata the notion of single-useness introduced in [8]. Simply put, a 2VPPA is *single-use* (denoted 2VPPAsu) if the transitions that affect the variables can only be taken once on any given input position, thus effectively bounding the size of variables linearly with respect to the size of the input. Formally, a state p of a 2VPPA P is *producing* if there exists a transition <sup>t</sup> from <sup>p</sup> on some symbol and <sup>λ</sup>(t) = 0*dim*. A 2VPPA is single-use if for every input w and every accepting run ρ over w, there do not exist two different configurations (p, i, d, σ) and (p, i, d, σ ) with p a producing state, meaning that ρ does not reach any position in the same direction twice in any given state of P. This property is a syntaxic restriction of the model. However, since this property is regular, it can equivalently be seen as a semantic one. Moreover, deciding the single-useness of a 2VPPA is ExpTime-c (see [8] for the same result but on transducers). Note that the Parikh automaton given in Example 1 is not single-use, since it passes over the second subword of internal letters i in state 2 as many times as there are call symbols. In the following, we prove that 2VPPAsu have the same expressiveness as VPPA, while being exponentially more succinct. In particular, this equivalence implies by Parikh's Theorem [24], semi-linearity of *Val*(P) for any 2VPPAsu P.

#### **3 Emptiness Complexity**

We show that the non-emptiness problem for VPPA is NP-complete. We actually show the upper-bound for the strictly more expressive *Pushdown Parikh Automata* (PPA), i.e. VPPA without the visibly restriction. While decidability was known [20,21], the precise complexity was, to the best of our knowledge, unknown. Let us also remark that the model and the proof are similar to the proof of NP-completeness of k-reversal pushdown systems from [16]. However, it is adapted here to Parikh automata as well as deterministic machines, which was not the case in [16].

**Theorem 2.** *The non-emptiness problem for* VPPA *and* PPA *is* NP-complete*. The complexity bounds hold even if the automata are deterministic, with a fixed dimension* <sup>2</sup>*, tuples of values in* {0, <sup>1</sup>}<sup>2</sup> *and with a fixed Presburger formula* φ(x1, x2) ≡ x<sup>1</sup> = x2*.*

*From* 2VPPAsu *to* VPPA From a two-way visibly pushdown Parikh automaton satisfying the single-useness restriction, one can build an equivalent one-way visibly pushdown Parikh automaton. The construction induces an exponential blow-up, which cannot be avoided, as with most constructions from two-way to one-way machines.

**Theorem 3.** *For any* 2VPPAsu A*, one can construct a* VPPA B *whose size is at most exponential in the size of* A *and such that L(A)=L(B). Moreover, the procedure can be done in exponential time.*

*Proof (Sketch).* The goal is to be able to correctly guess all the transitions exactly taken by a run of the two-way machine at once. More precisely, the one-way machine guesses the behavior of the two-way machine on each well-nested subword of the input, i.e. a set of partial runs over a subword. A partial run is a pair from Q × {←,→}. Informally, they describe a maximal subrun over a subword of the input. We call these sets of partial runs *profiles*, and we define relations <sup>C</sup> and <sup>N</sup>c,r to describe compatible profiles. Formally, the relation <sup>C</sup> ⊆ P<sup>3</sup> is the *concatenation* relation, defined as set of triples (P, P , P) such that there exists a word u = u1vv u<sup>2</sup> where v and v are well-nested subwords of u, and a run r on u such that P (resp. P ) is the profile of v in r (resp. of v ) and P is the profile of vv in <sup>r</sup>. Similarly, the relation <sup>N</sup>c,r ⊆ P<sup>2</sup> for c, r call and return letters respectively, is the cr*-nesting* relation, and defined as the set of pairs (P, P ) such that there exists a word u = u1cvru<sup>2</sup> where v is well-nested, and a run r of A on u such that P is the profile of v in r and P is the profile of cvr in r. We prove that these relations are computable in exponential time.

Given these relations, we can compute a VPPA B whose runs are bijective to the runs of A. Moreover, we can recover from a run of B which transitions are effectively taken at each positions by its bijective run of A. Then, the increment function simply does all the increments done by the run at a given position at once. Since the operation is the addition on integers, it is commutative and the variables are updated in the same way they were by the run of A. Note that we only recover which transitions are taken, and not how many times they are taken, which can depend on the size of the input. However, since A is single-use, we only have to add each non zero transition once, which gives the result.

As a direct corollary of Theorems 3 and 2, we get the following.

**Corollary 1.** *The emptiness of* 2VPPAsu *can be decided in* NExpTime*.*

### **4 NExpTime-Hardness**

In this section, we show that the problem of deciding whether the language of a 2VPPAsu is non-empty is hard for NExpTime. Moreover, we show that this hardness does not depend on the fact that we have taken existential Presburger formulas, nor on the vector dimensions, and nor on the fact that the values in the tuples are encoded in binary.

**Theorem 4.** *The non-emptiness problem for* 2VPPAsu *is NExpTime-hard. The result holds even if the automaton is deterministic, of dimension* 2*, with counter updates in* {0, 1}*, the Presburger formula is* φ(x1, x2) ≡ x<sup>1</sup> = x2*, and it is finitevisit.*

*Succinct Subset Sum Problem.* We reduce to the succinct subset sum problem (SSSP), which is NExpTime-hard [16]. Let us define SSSP. Let m, k ≥ 1, X = {x1,...,xk} and Y = {y1,...,ym} be sets of Boolean variables. Let θ be a Boolean formula over <sup>X</sup> <sup>∪</sup> <sup>Y</sup> . Any word <sup>v</sup> ∈ {0, <sup>1</sup>}<sup>k</sup>+<sup>m</sup> naturally defines a valuation of X ∪ Y (the first bit of v is the value of x1, etc.). We denote by θ[v] ∈ {0, 1} the truth value of θ under the valuation v. The formula θ defines 2<sup>k</sup> non-negative integers a1,...,a2<sup>k</sup> each with 2<sup>m</sup> bits, as follows:

$$a\_i = \theta[b\_i d\_1].2^{2^m - 1} + \theta[b\_i d\_2].2^{2^m - 2} + \dots + \theta[b\_i d\_{2^m}].2^{0}$$

where b<sup>i</sup> is the binary encoding over k bits of i, and d1,...,d2<sup>m</sup> is the lexicographic enumeration of {0, <sup>1</sup>}<sup>m</sup>, starting from 0<sup>m</sup>. Note that for all <sup>i</sup> <sup>∈</sup> {1,..., <sup>2</sup><sup>k</sup>}, <sup>a</sup><sup>i</sup> ∈ {0,..., <sup>2</sup><sup>2</sup><sup>m</sup> −1}. The *Succinct Subset Sum Problem* asks, given X, Y and <sup>θ</sup>, whether there exists <sup>J</sup> ⊆ {1,..., <sup>2</sup><sup>k</sup> <sup>−</sup> <sup>1</sup>} such that - <sup>j</sup>∈<sup>J</sup> <sup>a</sup><sup>j</sup> <sup>=</sup> <sup>a</sup>2<sup>k</sup> .

*Overview of the construction and encoding the values* ai. Given an instance of SSSP I, our goal is to construct a D2VPPAsu P = (C, ρ, φ) of dimension 2 such that |P| is polynomial in <sup>|</sup>θ<sup>|</sup> <sup>+</sup> <sup>k</sup> <sup>+</sup> <sup>m</sup> and <sup>L</sup>(P) <sup>=</sup> <sup>∅</sup> iff <sup>I</sup> has a solution.

The main idea is to ensure that L(C) = {X1e<sup>1</sup> ...X2k−<sup>1</sup>e2k−<sup>1</sup>#e2<sup>k</sup> | X<sup>i</sup> ∈ {0, 1}} where the X<sup>i</sup> are internal symbols which are used to encode a subset <sup>J</sup> ⊆ {1,..., <sup>2</sup><sup>k</sup> <sup>−</sup> <sup>1</sup>}, and each <sup>e</sup><sup>i</sup> is an encoding of <sup>a</sup>i, defined later, over some alphabet containing the symbol **1**, and such that the number of occurrences of **1** in e<sup>i</sup> is ai. In other words, e<sup>i</sup> somehow encodes a<sup>i</sup> in unary. For the vector part, the machine P, when running over Xiei, updates its dimensions depending on two cases: (1) if X<sup>i</sup> = 1 ("put value a<sup>i</sup> in J"), then any transition reading **1** has weight (1, 0) and any other transition has weight (0, 0), (2) if X<sup>i</sup> = 0, then every transition has weight (0, 0). So, if X<sup>i</sup> = 1, the value in the first dimension after processing Xie<sup>i</sup> has been incremented by ai. Similarly, when processing #e2<sup>k</sup> , any transition reading **1** increments the 2nd dimension by 1, so that after processing #e2<sup>k</sup> , this dimension has value a2<sup>k</sup> . The formula φ(x1, x2) then only requires equality of x<sup>1</sup> and x2, i.e. φ(x1, x2) ≡ x<sup>1</sup> = x2.

We now explain how to encode a<sup>i</sup> by a well-nested word ei. Due to the finitevisit restriction, every incremental transition can be triggered at most once for each input position. Since the value a<sup>i</sup> is possibly doubly exponential in m and

**Fig. 3.** On the left, the automaton Ai, for i<m. On the right, the automaton Am.

we are allowed to have a polynomial number of transitions (in |θ| + k + m), necessarily e<sup>i</sup> must be of doubly exponential length. The main idea is to use the stack and the two-wayness to recognise with a polynomial number of states well-nested words which are of doubly exponential length. We need a series of intermediate lemmas to achieve this idea. We start with a useful result about intersection of finite automata, here *reversible* finite automata (deterministic and backward deterministic). Let Σ = {1,...,m} and let us define recursively the sequence of words (ui)<sup>0</sup>≤i≤<sup>m</sup> ∈ Σ<sup>∗</sup> as follows: u<sup>0</sup> = 1, u<sup>i</sup> = u<sup>i</sup>−1iu<sup>i</sup>−<sup>1</sup> for 1 ≤ i<m and u<sup>m</sup> = u<sup>m</sup>−1mu<sup>m</sup>−1m.

**Lemma 2.** *The word* u<sup>m</sup> *has length* 2<sup>m</sup>*, and there exist* m *reversible finite automata* A0,...,A<sup>m</sup> *(Fig. 3) such that* (i) *each* A<sup>i</sup> *has* O(1) *states, and* (ii) <sup>m</sup> <sup>i</sup>=1 L(Ai) = {um}*.*

*Encoding of the values* ai. The idea is to define a well-nested word e<sup>i</sup> over an alphabet of call symbols Σ<sup>c</sup> = {c1,...,cm}, an alphabet of return symbols Σ<sup>r</sup> = {r1,...,rm} and an alphabet of internal symbols Σ<sup>ι</sup> = {0, 1, **1**, **0**}. The number of occurrences of **1** in e<sup>i</sup> will be exactly ai, i.e. #**1**(ei) = a<sup>i</sup> and hence, the Parikh automaton will just have to count the number of **1** occurrences. Let us remind the reader that a<sup>i</sup> is actually given by θ, and therefore, the automaton P will somehow have to evaluate θ for valuations of its variables that will be contained in ei. Let us now define the words ei. For that, we call a *binary tree* either an internal symbol **1**, **0**, or a well-nested word of the form c<sup>j</sup> t1t2r<sup>j</sup> where t1, t<sup>2</sup> are themselves binary trees. For a well-nested word of the form cwr, a root-to-leaf branch π is a sequence of calls x<sup>1</sup> ...x<sup>n</sup> such that cwr = x1w1x2w<sup>2</sup> ...xnwnrnw <sup>n</sup>r<sup>n</sup>−<sup>1</sup>w <sup>n</sup>−<sup>1</sup> ...r2w <sup>2</sup>r<sup>1</sup> where x<sup>1</sup> = c, r<sup>1</sup> = r and for some wi, w <sup>i</sup> well-nested words such that w<sup>n</sup> contains only internal symbols. The *height* of a binary tree t is the maximal length of a root-to-leaf branch, and it is *complete* if all root-to-leaf branches have the same length. Note that the number of internal symbols of a complete binary tree of height n is 2<sup>n</sup>.

Then, e<sup>i</sup> is the well-nested word defined by e<sup>i</sup> = c<sup>j</sup><sup>1</sup> bid1t1c<sup>j</sup><sup>2</sup> bid2t<sup>2</sup> ...c<sup>j</sup>2<sup>m</sup> bid2<sup>m</sup>t2<sup>m</sup>r<sup>j</sup>2<sup>m</sup> ...r<sup>j</sup><sup>1</sup> where


Our goal is now to prove that e<sup>i</sup> is a correct encoding of ai.

**Lemma 3.** *For all* <sup>i</sup> ∈ {1,..., <sup>2</sup>k}*,* #**1**(ei) = <sup>a</sup>i*, where* #**1**(ei) *denotes the number of occurrences of* **1** *in* ei*.*

*Proof.* By Condition 2, every root-to-leaf branch of e<sup>i</sup> has length 2m. Therefore, for all <sup>j</sup> ∈ {1,..., <sup>2</sup>m}, every root-to-leaf branch in <sup>t</sup><sup>j</sup> has length 2<sup>m</sup> <sup>−</sup> <sup>j</sup>. In particular, t2<sup>m</sup> does not contain any call symbol. Hence all the trees t<sup>j</sup> are complete binary trees of height 2<sup>m</sup> <sup>−</sup> <sup>j</sup>. So, every <sup>t</sup><sup>j</sup> has 2<sup>2</sup>m−<sup>j</sup> internal symbols and by Condition 4, we get #**1**(t<sup>j</sup> ) = <sup>θ</sup>[bid<sup>j</sup> ].2<sup>2</sup>m−<sup>j</sup> . Therefore, #**1**(ei) = -2<sup>m</sup> <sup>j</sup>=1 #**1**(t<sup>j</sup> ) = -2<sup>m</sup> <sup>j</sup>=1 <sup>θ</sup>[bid<sup>j</sup> ].2<sup>2</sup>m−<sup>j</sup> <sup>=</sup> <sup>a</sup>i.

Note that Condition 3 was not used in the previous proof, but it will be useful to define a succinct D2VPA recognising ei. The key result is the following. It states the existence of a succinct D2VPA which recognises exactly the candidate solutions to SSSP.

**Lemma 4.** *One can construct a* D2VPA B *such that* B *has polynomially many states in* |θ| + k + m *and* L(B) = {X1e<sup>1</sup> ...X2k−1e2k−<sup>1</sup>#e2<sup>k</sup> | X<sup>i</sup> ∈ {0, 1}}*.*

*Proof (Sketch).* First, we show the existence of a D2VPA A with polynomially many states in <sup>|</sup>θ|+k+<sup>m</sup> such that <sup>L</sup>(A) = {e<sup>i</sup> <sup>|</sup> <sup>i</sup> ∈ {1,..., <sup>2</sup><sup>k</sup>}} (Proposition ?? in Appendix). The main idea is to construct succinct D2VPA which check each of the conditions 1 to 4 of the definition of the encoding independently, and then to take their intersection (by running the first, then the second, etc.). Condition 1 is easy to check. For condition 2, we rely on Lemma 2, and run sequentially the automata A<sup>i</sup> (in m passes) to check independently that for all i, each root-to-leaf branch has a sequence of indices that belongs to Ai. Thanks to the reversibility of Ai, it is possible when going upward in the tree, to recover the previous state of Ai. For condition 3, we rely on the two-wayness to check that a sequence of m bits is a successor of another sequence succinctly, by doing O(m) passes over the two successor vectors. The stack is not necessary there. For condition 4, we rely on the existence of a succinct 2DFA which accepts all the valuations that satisfy a given Boolean formula.

We can finally construct the D2VPPAsu P = (C, ρ, φ) of dimension 2 whose language is non-empty iff the SSSP instance I has a solution. The automaton C performs a first pass on the whole word by running the automaton B of Lemma 4, to check that the input is of the form X1e<sup>1</sup> ...X2k−<sup>1</sup>e2k−<sup>1</sup>#e2<sup>k</sup> . During this pass, no vector dimension is incremented. During a second pass, C, when reading some X<sup>i</sup> = 1, it goes to some state q<sup>1</sup> from which it increments the 1st dimension whenever **1** is read (all other transitions have value (0, 0)). When reading some Xi+1, it stays in q<sup>1</sup> if Xi+1 = 1 or to q<sup>0</sup> otherwise, from which no transition touches the counters. When reading #, it goes to a state from which it increments only the 2nd dimension on reading **1**. Note that this automaton is *single-use*: any symbol **1** occurring in the whole input word is counted at most once. It is even finite-visit (each position is visited O(m + k + |θ|) times). Finally, one only needs to check whether the first dimension equals the second one, using a formula φ(x1, x2) ≡ x<sup>1</sup> = x2. Note that the following lemma proves Theorem 4, since SSSP is NExpTime-c.

**Lemma 5.** *Given an instance* X, Y, θ *of SSSP, one can construct a* D2VPPAsu <sup>P</sup> *of polynomial size in* <sup>|</sup>θ<sup>|</sup> <sup>+</sup> <sup>|</sup>X<sup>|</sup> <sup>+</sup> <sup>|</sup><sup>Y</sup> <sup>|</sup> *such that* <sup>L</sup>(P) <sup>=</sup> <sup>∅</sup> *iff SSSP has a solution.*

# **5 Applications to Decision Problems for Nested Word Transducers**

In this section, we give two applications of 2VPPA, namely on decision problems for two-way visibly pushdown transducers (2VPT). 2VPT were introduced in [8] as a model to define transductions from well-nested words to words, or, modulo tree linearisation, from tree to words. It was shown that they can express, even in their deterministic and single-use version, all functions from well-nested words to words definable in MSOT, in the sense of Courcelle [6], while having decidable equivalence problem. No upper bound was provided however. Using 2VPPA, we show that the equivalence of 2VPTsu defining functions can be tested in NExpTime. We also consider other standard problems from transducer theory and show, again using 2VPPA, their decidability. First, let us define formally 2VPT.

A *two-way visibly pushdown transducer* (2VPT for short) is a pair (A, μ) where A is a 2VPA and μ is a morphism from the sequences of transitions δ<sup>∗</sup> to some output alphabet Γ∗. A run of a 2VPT is a run of its underlying 2VPA. The *output* of a run ρ of the form (q0, i0, d0, σ0)τ1(q1, i1, d1, σ1)τ<sup>2</sup> ...τ-(q-, i-, d-, σ-) is μ(τ1...τ-). A run is accepted if it is accepted by its underlying automaton. The transduction defined by a 2VPT is the set of pairs (u, v) such that v is the output of some accepting run on u. A state p of a 2VPT is *producing* if there exists a transition τ such that p is the first component of τ and μ(τ ) = . Similarly to Parikh automata, a 2VPT T is single-use (denoted 2VPTsu) if for any valid run of T, we do not reach the same position twice in the same producing state. It is deterministic, denoted D2VPT, if its underlying automaton is deterministic.

*Deciding the k-valuedness and equivalence problems.* For any positive integer k, we say that a transducer is k*-valued* if all input word have at most k different outputs. In particular, it is 1-valued if it defines a (partial) function, and also called *functional* in that case.

**Theorem 5.** *Let* T *be a* 2VPTsu*, and* k *an integer. Then the* k*-valuedness of* T *can be decided in* NExpTime*. It is also* ExpTime-hard*.*

The theorem is proved by reducing the k-valuedness of T to the emptiness of a 2VPPAsu P that guesses k + 1 runs of T that produce k + 1 different outputs. To ensure that the output are different, during each run P guesses, and stores in counters, k output positions and the letters produced at these positions. The formula of P at the end simply checks, for each pairs of runs, that the same positions were guessed by both runs, and that the letters were different, ensuring that the guessed runs have different output pairwise. As two functional transducers are equivalent if they have the same domain and their union is 1-valued, we get the following corollary.

**Corollary 2.** *The equivalence of two functional* 2VPTsu T *and* T *can be decided in* NExpTime*. It is also* ExpTime-hard*.*

The NexpTime complexity of equivalence of tree to string transducers was already established for *Streaming Tree to string transducers* (STST), introduced in [1]. However, the conversion between the 2VPTsu and STST yields an exponential blow-up.

We can generalize Corollary 2 to *strictly* k-valued transducers. We say that a transducer T is strictly k-valued if each input word in the domain of T has *exactly* k different images. Then similarly to the previous corollary, two strictly k-valued transducers are equivalent if, and only if, they have same domain and their union is k-valued.

**Corollary 3.** *The equivalence of two strictly* k*-valued* 2VPTsu T *and* T *can be decided in* NExpTime*. It is also* ExpTime-hard*.*

Strict k-valuedness is however an undecidable property (this can be shown by using the Post correspondence problem), even for k = 2. Deciding the equivalence problem for k-valued 2VPTsu (which are not necessarily strictly k-valued) is open already in the stack-less case, and a (very) particular case has been solved in [14].

*Type-checking against Parikh properties.* Given a 2VPT T, it might be desirable to check some properties of the output words it produces, i.e., for a language L, whether the codomain of T is included in L. Formally, the *type-checking problem* asks, given a transducer T and a language L, whether T(Σ∗) ⊆ L. Unfortunately, this problem is undecidable when L is given by a visibly pushdown automaton (and T is a VPT) [13]. Nevertheless, we show that the type-checking problem is decidable when T is a 2VPTsu and L is the complement of the language given by a (stack-less) Parikh Automaton. As a consequence, we are able to decide whether a 2VPTsu T produces only well-nested words, i.e. if the output alphabet of T is structured and for every input word u and any v ∈ T(u), v is a well-nested word.

**Theorem 6.** *Let* T *be a* 2VPTsu *and* P *be a (stack-free) Parikh Automaton over the output alphabet of* T*. Then we can decide whether* T(Σ∗) ∩ L(P) = ∅ *in* NExpTime*. It is also* ExpTime-hard*.*

This is done by constructing a 2VPPAsu P which simulates T, and instead of producing letters, simulates P on the output of T. A word w on a structured alphabet Σ is not well-nested if either |w|<sup>c</sup> = |w|r, i.e. the number of call letters is not equal to the number of return letters, or if there exists a prefix u of w such that |u|<sup>c</sup> < |u|r. As this can be checked by a (non-deterministic) Parikh automata, we get the following corollary.

**Corollary 4.** *Let* T *be a* 2VPTsu *whose output alphabet is structured. It can be decided in* CoNExpTime *whether* T *only produces well-nested words.*

**Acknowledgements.** This work was supported by the Belgian FNRS CDR project Flare (J013116), the ARC project Transform (F´ed´eration Wallonie Bruxelles) and by the ANR Project *DELTA*, ANR-16-CE40-0007. Emmanuel Filiot is an FNRS research associate (Chercheur Qualifi´e).

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Kleene Algebra with Hypotheses**

Amina Doumane1,2, Denis Kuperberg1(B), Damien Pous<sup>1</sup>, and Pierre Pradic1,2

<sup>1</sup> Univ Lyon, EnsL, UCBL, CNRS, LIP, 69342 Lyon Cedex 07, France denis.kuperberg@ens-lyon.fr <sup>2</sup> Warsaw University, MIMUW, Warsaw, Poland

**Abstract.** We study the Horn theories of Kleene algebras and star continuous Kleene algebras, from the complexity point of view. While their equational theories coincide and are PSpace-complete, their Horn theories differ and are undecidable. We characterise the Horn theory of star continuous Kleene algebras in terms of downward closed languages and we show that when restricting the shape of allowed hypotheses, the problems lie in various levels of the arithmetical or analytical hierarchy. We also answer a question posed by Cohen about hypotheses of the form 1 = *S* where *S* is a sum of letters: we show that it is decidable.

**Keywords:** Kleene algebra · Hypotheses · Horn theory · Complexity

### **1 Introduction**

Kleene algebras [6,10] are idempotent semirings equipped with a unary operation *star* such that x<sup>∗</sup> intuitively corresponds to the sum of all powers of x. They admit several models which are important in practice: formal languages, where L<sup>∗</sup> is the Kleene star of a language L; binary relations, where R<sup>∗</sup> is the reflexive transitive closure of a relation R; matrices over various semirings, where M<sup>∗</sup> can be used to perform flow analysis.

A fundamental result is that their equational theory is decidable, and actually PSpace-complete. This follows from a completeness result which was proved independently by Kozen [11] and Krob [17] and Boffa [3], and the fact that checking language equivalence of two regular expressions is PSpace-complete: given two regular expressions, we have

$$\mathsf{KA} \vdash e \leq f \quad \text{iff} \quad [e] \subseteq [f]$$

(where KA e ≤ f denotes provability from Kleene algebra axioms, and [e] is the language of a regular expression e).

This work has been supported by the European Research Council (ERC) under the European Union's Horizon 2020 programme (CoVeCe, grant agreement No 678157) and by the LABEX MILYON (ANR-10-LABX-0070) of Universit´e de Lyon, within the program "Investissements d'Avenir" (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR).

c The Author(s) 2019

M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 207–223, 2019. https://doi.org/10.1007/978-3-030-17127-8\_12

Because of their interpretation in the algebra of binary relations, Kleene algebras and their extensions have been used to reason abstractly about program correctness [1,2,9,12,15]. For instance, if two programs can be abstracted into two relational expressions (R∗; <sup>S</sup>)<sup>∗</sup> and ((<sup>R</sup> <sup>∪</sup>S)∗; <sup>S</sup>)<sup>=</sup>, then we can deduce that these programs are equivalent by checking that the regular expression (a∗b)<sup>∗</sup> and (a + b)∗b + 1 denote the same language. This technique made it possible to automate reasoning steps in proof assistants [4,16,19].

In such a scenario, one often has to reason under assumptions. For instance, if we can abstract our programs into relational expressions (R +S)<sup>∗</sup> and S∗; R∗, then we can deduce algebraically that the starting programs are equal if we know that R; S = R (i.e., that S is a no-op when executed after R). When doing so, we move from the equational theory of Kleene algebras to their Horn theory: we want to know whether a given set of equations, the *hypotheses*, entails another equation in all Kleene algebras. Unfortunately, this theory is undecidable in general [13]. In this paper, we continue the work initiated by Cohen [5] and pursued by Kozen [13], by characterising the precise complexity of new subclasses of this general problem.

A few cases have been shown to be decidable in the literature, when we restrict the form of the hypotheses:


(In the first two cases, the complexity can be shown to remain in PSpace.) We add one positive case, which was listed as open by Cohen [5], and which is typically useful to express that a certain number of predicates cover all cases:

– when hypotheses are of the form S = 1 for S a sum of letters.

Conversely, Kozen also studied the precise complexity of various undecidable sub-classes of the problem [13]. For those, one has to be careful about the precise definition of Kleene algebras. Indeed, these only form a quasi-variety (their definition involves two implications), and one often consider ∗*-continuous* Kleene algebras [6], which additionally satisfy an infinitary implication (We define these formally in Sect. 2). While the equational theory of Kleene algebras coincides with that of ∗-continuous Kleene algebras, this is not the case for their Horn theories: there exist Horn sentences which are valid in all ∗-continuous Kleene algebras but not in all Kleene algebras.

Kozen [13] showed for instance that when hypotheses are of the form pq = qp for pairs of letters (p, q), then validity of an implication in all ∗-continuous Kleene algebras is Π<sup>0</sup> <sup>1</sup> -complete, while it is only known to be ExpSpace-hard for plain Kleene algebras. In fact, for plain Kleene algebras, the only known negative result is that the problem is undecidable for hypotheses of the form u = v for


**Fig. 1.** Summary of the main results.

pairs (u, v) of words (Kleene star plays no role in this undecidability result: this is just the word problem). We show that it is already undecidable, and in fact Σ<sup>0</sup> <sup>1</sup> -complete when hypotheses are of the form a ≤ S where a is a letter and S is a sum of letters. We use a similar encoding as in [13] to relate the Horn theories of KA and KA<sup>∗</sup> to runs of Turing Machines and alternating linearly bounded automata. This allows us to show that deciding whether an inequality w ≤ f holds where w is a word, in presence of sum-of-letters hypotheses, is EXPTIMEcomplete. We also refine the Π<sup>1</sup> <sup>1</sup> -completeness result obtained in [13] for general hypotheses, by showing that hypotheses of the form a ≤ g where a is a letter already make the problem Π<sup>1</sup> <sup>1</sup> -complete.

The key notion we define and exploit in this paper is the following: given a set H of equations, and given a language L, write clH(L) for the smallest language containing L such that for all hypotheses (e ≤ f) ∈ H and all words u, v,

$$\text{if} \quad u[f]v \subseteq \text{cl}\_H(L) \quad \text{then} \quad u[e]v \subseteq \text{cl}\_H(L) \quad .$$

This notion makes it possible to characterise the Horn theory of ∗-continuous Kleene algebras, and to approximate that of Kleene algebras: we have

$$\mathsf{KA}\_{H} \vdash e \leq f \quad \Rightarrow \quad \mathsf{KA}\_{H}^{\*} \vdash e \leq f \quad \Leftrightarrow \quad [e] \subseteq \mathrm{cl}\_{H}([f])$$

where KA<sup>H</sup> e ≤ f (resp. KA<sup>∗</sup> <sup>H</sup> e ≤ f) denotes provability in Kleene algebra (resp. ∗-continuous Kleene algebra). We study downward closed languages and prove the above characterisation in Sect. 3.

The first implication can be strengthened into an equivalence in a few cases, for instance when the regular expression e and the right-hand sides of all hypotheses denote finite languages, or when hypotheses have the form 1 = S for S a sum of letters. We obtain decidability in those cases (Sect. 4).

Then we focus on cases where hypotheses are of the form a ≤ e for a a letter, and we show that most problems are already undecidable there. We do so by exploiting the characterisation in terms of downward closed languages to provide encodings of various undecidable problems on Turing machines, total Turing machines, and linearly bounded automata (Sect. 5).

We summarise our results in Fig. 1. The top of each column restricts the type of allowed hypotheses. Variables e, f stand for general expressions, u, w for words, and a, b for letters. Grayed statements are implied by non-grayed ones.

*Notations.* We let a, b range over the letters of a finite alphabet Σ. We let u, v, w range over the words over Σ, whose set is written Σ∗. We write for the empty word; uv for the concatenation of two words u, v; |w| for the length of a word w. We write Σ<sup>+</sup> for the set of non-empty words. We let e, f, g range over the regular expressions over Σ, whose set is written ExpΣ. We write [e] for the language of such a an expression e: [e] ⊆ Σ∗. We sometimes implicitly regard a word as a regular expression. If X is a set, P(X) (resp. Pfin(X)) is the set of its subsets (resp. finite subsets) and |X| for its cardinality.

A long version of this extended abstract is available on HAL [8], with most proofs in appendix.

# **2 The Systems KA and KA***<sup>∗</sup>*

**Definition 1 (**KA,KA∗**).** *A* Kleene algebra *is a tuple* (M, 0, 1, +, ·, ∗) *where* (M, 0, 1, +, ·) *is an idempotent semiring and the following axioms and implications, where the partial order* ≤ *is defined by* x ≤ y *if* x + y = y*, hold for all* x, y ∈ M*.*


*A Kleene algebra is* ∗*-continuous if it satisfies the following implication:*

$$(\forall i \in \mathbb{N}, \ xy^i z \le t) \implies xy^\* z \le t$$

*A* hypothesis *is an inequation of the form* e ≤ f*, where* e *and* f *are regular expressions. If* H *is a set of hypotheses, and* e, f *are regular expressions, we write* KA<sup>H</sup> e ≤ f *(resp.* KA<sup>∗</sup> <sup>H</sup> e ≤ f*) if* e ≤ f *is derivable from the axioms and implications of* KA *(resp.* KA∗*) as well as the hypotheses from* H*. We omit the subscript when* H *is empty.*

Note that the letters appearing in the hypotheses are constants: they are not universally quantified. In particular if H = {aa ≤ a}, we may deduce KA<sup>H</sup> - a<sup>∗</sup> ≤ a but not KA<sup>H</sup> b<sup>∗</sup> ≤ b.

Languages over the alphabet Σ form a ∗-continuous Kleene algebra, as well as binary relations over an arbitrary set.

In absence of hypotheses, provability in KA is coincides with provability in KA<sup>∗</sup> and with language inclusion:

**Theorem 1 (Kozen** [11]**).**

$$\mathsf{KA} \vdash e \leq f \quad \Leftrightarrow \quad \mathsf{KA}^\* \vdash e \leq f \quad \Leftrightarrow \quad [e] \subseteq [f] \subseteq$$


We will classify the theories based on the shape of hypotheses we allow; we list them below (I is a finite non-empty set):

We call *letter hypotheses* any class of hypotheses where the left-hand side is a letter (the last four ones). In the rest of the paper, we study the following problem from a complexity point of view: given a set of C-hypotheses H, where C is one of the classes listed above, and two expressions e, f ∈ ExpΣ, can we decide whether KA<sup>H</sup> e ≤ f (resp. KA<sup>∗</sup> <sup>H</sup> e ≤ f) holds? We call it the problem of **deciding** KA **(resp.** KA∗**) under** C**-hypotheses**.

#### **3 Closure of Regular Languages**

It is known that provability in KA and KA<sup>∗</sup> can be characterised by language inclusions (Theorem 1). In the presence of hypotheses, this is not the case anymore: we need to take the hypotheses into account in the semantics. We do so by using the following notion of *downward closure* of a language.

#### **3.1 Definition of the Closure**

**Definition 2 (**H**-closure).** *Let* H *be a set of hypotheses and* L ⊆ Σ<sup>∗</sup> *be a language. The* H-closure of L*, denoted* clH(L)*, is the smallest language* K *such that* L ⊆ K *and for all hypotheses* e ≤ f ∈ H *and all words* u, v ∈ Σ∗*, we have*

$$u[f]v \subseteq C \qquad \Rightarrow \qquad u[e]v \subseteq K$$

Alternatively, clH(L) can be defined as the least fixed point of the function φ<sup>L</sup> : P(Σ∗) → P(Σ∗) defined by φL(X) = L ∪ ψH(X), where

$$\psi\_H(X) = \bigcup\_{(e \le f) \in H} \{ u[e]v \mid u, v \in \Sigma^\*, u[f]v \subseteq X \}.$$

*Example 1.* If H = {ab ≤ ba} then clH([b∗a∗]) = [(a + b)∗], while clH([a∗b∗]) = [a∗b∗].

In order to manipulate closures more conveniently, we introduce a syntactic object witnessing membership in a closure: derivation trees.

**Definition 3.** *Let* H *be a set of hypotheses and* L *a regular language. We define an infinitely branching proof system related to* clH(L)*, where statements are regular expressions, and rules are the following, called respectively* axiom*,* extension*, and* hypothesis*:*

$$\frac{1}{u\_{\
u}}\ u \in L \qquad \frac{(u)\_{u \in [e]}}{e} \quad \frac{ufv}{uwv}\ w \in [e], \ e \le f \in H$$

*We write* -H,L e *if* e *is derivable in this proof system, i.e. if there is a wellfounded tree using these rules, with root* e *and all leaves labelled by words in* L*. Such a tree will be called a* derivation tree *for* [e] ⊆ clH(L) *(or* e ∈ clH(L) *if* e *is a word).*

*Example 2.* The following derivation is a derivation tree for bababa ∈ clH([b∗a∗]), where H = {ab ≤ ba}.


Derivation trees witness membership to the closure as shown by the following proposition.

**Proposition 1.** [e] ⊆ clH(L) *iff* -H,L e*.*

(See [8, App. A] for a proof.)

#### **3.2 Properties of the Closure Operator**

We summarise in this section some useful properties of the closure. Lemma 1 shows in particular that the closure is idempotent, monotonic (both for the set of hypotheses and its language argument) and invariant by context application. Lemma 2 shows that internal closure operators can be removed in the evaluation of regular expressions. Those two lemmas are proved in [8, App. A].

**Lemma 1.** *Let* A, B, U, V ⊆ Σ∗*. We have*

*1.* A ⊆ clH(A) *2.* clH(clH(A)) = clH(A) *3.* A ⊆ B *implies* clH(A) ⊆ clH(B) *4.* H ⊆ H *implies* clH(A) ⊆ cl<sup>H</sup>- (A) *5.* clH(A) ⊆ clH(B) *if and only if* A ⊆ clH(B)*. 6.* A ⊆ clH(B) *implies* UAV ⊆ clH(UBV )*.*

**Lemma 2.** *Let* A, B ⊆ Σ∗*, then*

*1.* clH(A + B) = clH(clH(A) + clH(B))*, 2.* clH(AB) = clH(clH(A)clH(B))*, 3.* clH(A∗) = clH(clH(A)∗)

#### **3.3 Relating Closure and Provability in KA***<sup>H</sup>* **and KA***<sup>∗</sup> H*

We show that provability in KA∗ can be characterized by closure inclusions. In KA, provability implies closure inclusions but the converse is not true in general.

**Theorem 2.** *Let* H *be a set of hypotheses and* e, f *be two regular expressions.*

$$\mathsf{KA}\_{H} \vdash e \leq f \qquad \Rightarrow \qquad \mathsf{KA}\_{H}^{\*} \vdash e \leq f \qquad \Leftrightarrow \qquad [e] \subseteq \mathrm{cl}\_{H}([f])$$

*Proof.* Let CRegH,Σ = {clH(L) | L ∈ RegΣ}, on which we define the following operations:

$$X \oplus Y = \text{cl}\_H(X+Y) \qquad X \odot Y = \text{cl}\_H(X \cdot Y) \qquad X^{\oplus} = \text{cl}\_H(X^\*).$$

We define the *closure model* FH,Σ = (CRegH,Σ, ∅, {}, ⊕, , -).

We write ≤ for the inequality induced by ⊕ in FH,Σ: X ≤ Y if X ⊕ Y = Y .

**Lemma 3.** FH,Σ = (CRegH,Σ, ∅, {}, ⊕, , -) *is a* ∗*-continuous Kleene algebra. The inequality* ≤ *of* FH,Σ *coincides with inclusion of languages.*

*Proof.* By Lemma 2, the function cl<sup>H</sup> : (P(Σ∗), +, ·, ∗) → (CRegH,Σ, ⊕, , -) is a homomorphism. We show that FH,Σ is a ∗-continuous Kleene algebra. First, identities of Lang<sup>Σ</sup> = (P(Σ∗), +, ·, ∗) are propagated through the morphism clH, so only Horn formulas defining ∗-continuous Kleene algebras remain to be verified. It suffices to prove that FH,Σ satisfies the ∗-continuity implication, because the implication xy ≤ y → x∗y ≤ y and its dual can be deduced from it. Let A, B, C <sup>∈</sup> <sup>F</sup>H,Σ such that for all <sup>i</sup> <sup>∈</sup> <sup>N</sup>, <sup>A</sup> <sup>B</sup> <sup>i</sup> C ≤ D, where B <sup>i</sup> <sup>=</sup> <sup>B</sup> ··· <sup>B</sup>. By Lemma 2, <sup>A</sup> <sup>B</sup> <sup>i</sup> <sup>C</sup> = clH(AB<sup>i</sup> C), so we have clH(AB<sup>i</sup> <sup>C</sup>) <sup>≤</sup> <sup>D</sup>, and in particular AB<sup>i</sup> C ≤ D for all i. By ∗-continuity of LangΣ, we obtain AB∗C ≤ D. By Lemma 1 and using D = clH(D), we obtain clH(AB∗C) ≤ D and finally by Lemma 2, A B- C ≤ D. This achieves the proof that FH,Σ is a ∗-continuous Kleene algebra.

Let A, B ∈ CRegH,Σ. We have A ≤ B ⇔ A ⊕ B = B ⇔ clH(A + B) = B ⇔ A ⊆ B. Finally, if e ≤ f is a hypothesis from H, then we have clH[e] ⊆ clH([f]), so the hypothesis is verified in FH,Σ.

The implications KA(∗) <sup>H</sup> e ≤ f ⇒ [e] ⊆ clH(f) follow from the fact that if an inequation e ≤ f is derivable in KA<sup>H</sup> (resp. KA<sup>∗</sup> <sup>H</sup>) then it is true in every model, in particular in the model FH,Σ, thus clH([e]) ⊆ clH([f]) or, equivalently. [e] ⊆ clH([f]).

Let us prove that for any regular expressions e, f, if [e] ⊆ clH([f]) then KA<sup>∗</sup> <sup>H</sup> e ≤ f. Let e, f be two such expressions and let T be a derivation tree for [e] ⊆ clH([f]), i.e. witnessing -H,L e ≤ f. We show that we can transform this tree T into a proof tree in KA<sup>∗</sup> <sup>H</sup>. The extension rule is an occurrence of [8, App. A, Lem. 12]. Finally, the hypothesis rule is also provable in KA<sup>∗</sup> <sup>H</sup>, using the hypothesis e ≤ f together with compatibility of ≤ with concatenation, and completeness of KA<sup>∗</sup> for membership of u ∈ [e]. We can therefore build from the tree T a proof in KA<sup>∗</sup> <sup>H</sup> witnessing KA<sup>∗</sup> <sup>H</sup> e ≤ f.

When we restrict the shape of the expression e to words, and hypotheses to (w ≤ w)-hypotheses, we get the implication missing from Theorem 2.

**Proposition 2.** *Let* H *be a set of* (w ≤ w)*-hypotheses,* w ∈ Σ<sup>∗</sup> *and* f ∈ ExpΣ*.*

$$\mathsf{KA}\_{H} \vdash w \leq f \qquad \Leftrightarrow \qquad w \in \mathrm{cl}\_{H}([f])$$

*Proof.* Let us show that w ∈ clH([f]) implies KA<sup>H</sup> w ≤ f. We proceed by induction on the height of a derivation tree for w ∈ clH([f]). If this tree is just a leaf, then w ∈ [f] and by Theorem 1 KA w ≤ f. Otherwise, this derivation starts with the following steps:

$$\frac{\left(\frac{\cdots}{uw\_i v}\right)\_i}{\frac{u(\sum\_i w\_i)v}{uwv}}\_w w \le \sum\_i w\_i \in H$$

Our inductive assumption is that - KA<sup>H</sup> uwiv ≤ f for all i, thus KA<sup>H</sup> - <sup>i</sup> uwiv ≤ f. We also have KA<sup>H</sup> w ≤ ( - <sup>i</sup> wi) hence KA w ≤ f by distributivity.

#### **4 Decidability of KA and KA***<sup>∗</sup>* **with (1 =** *x***)-Hypotheses**

In this section, we answer positively the decidability problem of KAH, where H is a set of (1 = x)-hypotheses, posed by Cohen [5]:

**Theorem 3.** *If* H *is a set of* (1 = x)*-hypotheses, then* KA<sup>H</sup> *is decidable.*

To prove this theorem we show that in the case of (1 = x)-hypotheses:

(P1) KA<sup>H</sup> e ≤ f if and only if [e] ⊆ clH([f]).

(P2) clH([f]) is regular and we can compute effectively an expression for it.

Decidability of KA<sup>H</sup> follows immediately from (P1) and (P2), since it amounts to checking language inclusion for two regular expressions.

To show (P1) and (P2), it is enough to prove the following result:

**Theorem 4.** *Let* H *be a set of* (1 = x)*-hypotheses and let* f *be a regular expression. The language* clH([f]) *is regular and we can compute effectively an expression* c *such that* [c] = clH([f]) *and* KA<sup>H</sup> c ≤ f*.*

(P2) follows immediately from Theorem 4. To show (P1), it is enough to prove that [e] ⊆ clH([f]) implies KA<sup>H</sup> e ≤ f, since the other implication is always true (Theorem 2). Let e, f such that [e] ⊆ clH([f]). If c is the expression given by Theorem 4, we have KA<sup>H</sup> c ≤ f and [e] ⊆ [c] so by Theorem 1 KA e ≤ c, and this concludes the proof.

To prove Theorem 4, we first show that the closure of (1 = x)-hypotheses can be decomposed into the closure of (x ≤ 1)-hypotheses followed by the closure of (1 ≤ x)-hypotheses:

**Proposition 3 (Decomposition result).** *Let* H = {1 = S<sup>j</sup> | j ∈ J} *be a set of* (1 = x)*-hypotheses.*

*We set* H*sum* = {1 ≤ S<sup>j</sup> | j ∈ J} *and* H*id* = {a ≤ 1 | a ∈ [S<sup>j</sup> ], j ∈ J}*. For every language* L ⊆ Σ∗*, we have* clH(L) = cl<sup>H</sup>*sum* (cl<sup>H</sup>*id* (L))*.*

*Sketch.* We show that rules from H*id* can be locally permuted with rules of H*sum* in a derivation tree. This allows to compute a derivation tree where all rules from H*id* occur after (i.e. closer to leaves than) rules from H*sum*.

Now, we will show results similar to Theorem 4, but which apply to (x ≤ 1) hypotheses and (1 ≤ x)-hypotheses (Propositions 5 and 6 below). To prove Theorem 4, the idea is to decompose H into H*id* and H*sum* using the decomposition property Proposition 3, then applying Propositions 5 and 6 to H*id* and H*sum* respectively.

To show these two propositions, we make use of a result from [7]:

**Definition 4.** *Let* A = (Q, Δ, ι, F) *be an NFA,* H *be a set of hypotheses and* ϕ : Q → Exp<sup>Σ</sup> *a function from states to expressions. We say that* ϕ *is* Hcompatible *with* A *if:*

*–* KA<sup>H</sup> - 1 ≤ ϕ(q) *whenever* q ∈ F*, –* KA<sup>H</sup> aϕ(r) ≤ ϕ(q) *for all transitions* (q, a, r) ∈ Δ*.*

*We set* ϕ<sup>A</sup> = ϕ(ι)*.*

**Proposition 4 (**[7]**).** *Let* A *be a NFA,* H *be a set of hypothesis and* ϕ *be a function* H*-compatible with* A*. We can construct a regular expression* f<sup>A</sup> *such that:*

> [fA]=[A] *and* KA<sup>H</sup> f<sup>A</sup> ≤ ϕ<sup>A</sup>

**Proposition 5.** *Let* H *be a set of* (x ≤ 1)*-hypotheses and let* f *be a regular expression. The language* clH([f]) *is regular and we can compute effectively an expression* c *such that* [c] = clH([f]) *and* KA<sup>H</sup> c ≤ f*.*

*Proof.* Let K = clH([f]) and Γ = {a | (a ≤ 1) ∈ H}, we show that K is regular. If A is a NFA for f, a NFA A*id* recognizing K can be built from A by adding a Γ-labelled loop on every state. It is straightforward to verify that the resulting NFA recognizes K, by allowing to ignore any letter from Γ.

For every q ∈ Q, let f<sup>q</sup> be a regular expression such that [fq]=[q]A, where [q]<sup>A</sup> denotes the language accepted from q in A. Let ϕ : Q → Exp<sup>Σ</sup> which maps each state q of A*id* (which is also a state of A) to ϕ(q) = fq. Let us show that ϕ is H-compatible with A. If q ∈ F, then 1 ∈ [fq], so by completeness of KA, we have KA - 1 ≤ fq. Let (p, a, q) be a transition of A*id* . Either (p, a, q) ∈ Δ, in which case we have a[fq] ⊆ [fp], and so by Theorem 1 KA af<sup>q</sup> ≤ fp. Or p = q (this transition is a loop that we added). Then KA<sup>H</sup> a ≤ 1, so KA<sup>H</sup> af<sup>p</sup> ≤ fp, and this concludes the proof.

By Proposition 4, we can now construct a regular expression c which satisfies the desired properties.

**Definition 5.** *Let* Γ *be a set of letters. A language* L *is said to be* Γ-closed *if:*

$$\forall u, v \in \Sigma^\*, \forall a \in \Gamma \qquad uv \in L \qquad \Rightarrow \qquad uav \in L$$

*If* H = {1 ≤ S<sup>i</sup> | i ∈ I} *is a set of* (1 ≤ x)*-hypotheses, we say that a language* L *is* H*-closed if if it is* Γ*-closed where* Γ = ∪<sup>i</sup>∈<sup>I</sup> [Si]*.*

*Remark 1.* If H is a set of (x ≤ 1)-hypothesis, and Γ = {a | (a ≤ 1) ∈ H}, then clH(L) is Γ-closed for every language L.

**Proposition 6.** *Let* H *be a set of* (1 ≤ x)*-hypotheses and let* f *be a regular expression whose language is* H*-closed. The language* clH([f]) *is regular and we can compute effectively an expression* c *such that* [c] = clH([f]) *and* KA<sup>H</sup> c ≤ f*.*

*Proof.* We set L = [f], H = {1 ≤ S<sup>j</sup> | j ∈ J} and Γ = {a | a ∈ [S<sup>j</sup> ], j ∈ J}.

Let us show that clH(L) is regular. The idea is to construct a set of words L, where each word u is obtained from a word u of clH(L), by adding at the position where a rule (1 ≤ S<sup>j</sup> ) is applied in the derivation tree for clH(L) u, a new symbol <sup>j</sup> . We will show that this set satisfies the two following properties:

– clH(L) is obtained from L by erasing the symbols <sup>j</sup> . – L is regular.

Since the operation that erases letters preserves regularity, we obtain as a corollary that clH(L) is regular.

Let us now introduce more precisely the language L and show the properties that it satisfies. Let Θ = {<sup>j</sup> | j ∈ J} be a set of new letters and Σ = Σ ∪ Θ be the alphabet Σ enriched with these new letters.

We define the function *exp* : Σ → P(Σ) that expands every letter <sup>j</sup> into the sum of the letters corresponding to its rule in H as follows:

$$\begin{aligned} \exp(a) &= a & \text{if } a \in \Sigma\\ \exp(\sharp\_j) &= \{ a \mid a \in [S\_j] \} & \forall j \in J \end{aligned}$$

This function can naturally be extended to *exp* : (Σ)<sup>∗</sup> → P(Σ∗). If L ⊆ Σ∗, we define L ⊆ (Σ)<sup>∗</sup> as follows:

$$L\_{\sharp} = \exp^{-1}(\mathcal{P}(L)) = \{ u \in (\Sigma\_{\sharp})^{\*} \mid \exp(u) \subseteq L \}.$$

We define the morphism π : (Σ)<sup>∗</sup> → Σ<sup>∗</sup> that erases the letters from Θ as follows: π(a) = a if a ∈ Σ and π(<sup>j</sup> ) = for all j ∈ J. Our goal is to prove that clH(L) = π(L) and that L is regular. To prove the first part, we need an alternative presentation of L as the closure of a new set of hypotheses H which we define as follows:

$$H\_{\sharp} = \{ \sharp\_j \le S\_j \mid j \in J \} \cup \{ \sharp\_j \le 1 \mid j \in J \}.$$

**Lemma 4.** *We have* L = cl<sup>H</sup>-(L)*. In particular* L *is* Θ*-closed.*

See App. B for a detailed proof of Lemma 4.

**Lemma 5.** clH(L) = π(L)*.*

*Proof.* If u ∈ π(L), let v ∈ L such that u = π(v). By Lemma 4, there is a derivation tree T<sup>v</sup> for v ∈ clH- (L). Erasing all occurrences of <sup>j</sup> in T<sup>v</sup> yields a derivation tree for u ∈ clH(L).

Conversely, if u ∈ clH(L) is witnessed by some derivation tree Tu, we show by induction on <sup>T</sup><sup>u</sup> that there exists <sup>v</sup> <sup>∈</sup> <sup>L</sup> <sup>∩</sup> <sup>π</sup>−<sup>1</sup>(u). If <sup>T</sup><sup>u</sup> is a single leaf, we have u ∈ L, and therefore it suffices to take v = u.

Otherwise, the rule applied at the root of T<sup>u</sup> partitions u into u = wz, and has premises {wbz | b ∈ [S<sup>j</sup> ]} for some j ∈ J and w, z ∈ Σ∗. By induction hypothesis, for all <sup>b</sup> <sup>∈</sup> [S<sup>j</sup> ], there is <sup>v</sup><sup>b</sup> <sup>∈</sup> <sup>L</sup> <sup>∩</sup> <sup>π</sup>−<sup>1</sup>(wbz). Let <sup>w</sup> <sup>=</sup> <sup>w</sup><sup>1</sup> ...w<sup>n</sup> and <sup>z</sup> <sup>=</sup> <sup>z</sup><sup>1</sup> ...z<sup>m</sup> be the decompositions of w, z into letters of Σ. By definition of π, for all b ∈ [S<sup>j</sup> ], v<sup>b</sup> can be written v<sup>b</sup> = αb,1w1αb,2w<sup>2</sup> ...wnαb,nbαb,n+1z1αb,n+2 ...zmαb,n+m+3, with αb,<sup>0</sup> ...αb,n+m+3 ∈ (Θ)∗. For each k ∈ [0, n + m + 3], let α<sup>k</sup> = Π<sup>b</sup>∈[S<sup>j</sup> ]αb,k. Let w = α0w1α<sup>1</sup> ...wnα<sup>n</sup>+1 and z = α<sup>n</sup>+2z1α<sup>n</sup>+3 ...zmα<sup>n</sup>+m+3. By Lemma 4, L is Θ-closed, so for each b ∈ [S<sup>j</sup> ] the word v <sup>b</sup> = w bz is in L, since v <sup>b</sup> is obtained from v<sup>b</sup> by adding letters from Θ. We can finally build v = w <sup>j</sup> z . We have *exp*(v) = <sup>b</sup>∈[S<sup>j</sup> ] *exp*(v <sup>b</sup>) ⊆ L, and π(v) = π(w )π(z ) = wz = u.

**Lemma 6.** L *is a regular language, computable effectively.*

*Sketch.* From a DFA A = (Σ, Q, q0, F, δ) for for L, we first build a DFA A<sup>∧</sup> = (Σ,P(Q), q0,P(F), δ∧), which corresponds to a powerset construction, except that accepting states are P(F). This means that the semantic of a state P is the conjunction of its members. We then build A = (Σ,P(Q), q0,P(F), δ) based on A∧, which can additionally read letters of the form <sup>j</sup> , by expanding them using the powerset structure of A∧.

**Lemma 7.** *We can construct a regular expression* c *such that* [c] = clH(L) *and* KA<sup>H</sup> c ≤ f*.*

*Proof.* Let A be the DFA constructed for L in the proof of Lemma 6. We will use the notations of this proof in the following.

Let π(A)=(Σ,P(Q), q0,P(F), π(δ)) be the NFA obtained from A by replacing every transition δ(P, <sup>j</sup> ) = R, where j ∈ J, by a transition π(δ)(P, ) = R. By Lemma 5, the automaton π(A) recognizes the language clH(L). Let us construct a regular expression c for this automaton such that KA<sup>H</sup> c ≤ f.

For every P ∈ P(Q), let f<sup>P</sup> be a regular expression such that [f<sup>P</sup> ]=[P]<sup>A</sup><sup>∧</sup> . Let ϕ : P(Q) → Exp<sup>Σ</sup> be the function which maps each state P of π(A) to ϕ(P) = f<sup>P</sup> . Let us show that ϕ is H-compatible.

If P ∈ P(F), then P is a final state of A∧, so 1 ∈ [f<sup>P</sup> ], and by completeness of KA, KA - 1 ≤ f<sup>P</sup> . Let (P, a, R) ∈ π(Δ). Either a ∈ Σ, so (P, a, R) ∈ Δ<sup>∧</sup> and a[fR] ⊆ [f<sup>P</sup> ], so by Theorem 1 KA af<sup>R</sup> ≤ f<sup>P</sup> . Or a = so there is j ∈ J such that (P, <sup>j</sup> , R) ∈ Δ. This means that R = ∪<sup>b</sup>∈[S<sup>j</sup> ]R<sup>b</sup> where δ∧(P, b) = Rb, ∀b ∈ [S<sup>j</sup> ]. We have then that b[f<sup>R</sup><sup>b</sup> ] ⊆ [f<sup>P</sup> ] for all b ∈ [S<sup>j</sup> ]. Note that for all b ∈ [S<sup>j</sup> ], R<sup>b</sup> ⊆ R, so [fR] ⊆ [f<sup>R</sup><sup>b</sup> ] and then S<sup>j</sup> [fR] ⊆ [f<sup>P</sup> ]. By Theorem 1 KA - Sjf<sup>R</sup> ≤ f<sup>P</sup> . We have also that KA<sup>H</sup> <sup>j</sup> ≤ S<sup>j</sup> , so KA<sup>H</sup> <sup>j</sup>f<sup>R</sup> ≤ f<sup>P</sup> .

By Proposition 4, we can construct the desired regular expression c.

#### **5 Complexity Results for Letter Hypotheses**

In this section, we give a recursion-theoretic characterization of KAH and KA∗ H where H is a set of letter hypotheses or (w ≤ w)-hypotheses. In all the section, by "deciding KA(∗) <sup>H</sup> " we mean deciding whether KA(∗) H e ≤ f, given e, f, H as input.

Theses various complexity classes will be obtained by reduction from some known problems concerning Turing Machines (TM) and alternating linearly bounded automata (LBA), such as halting problem and universality.

To obtain these reductions, we build on a result which bridges TMs and LBAs on one hand and closures on the other: the set of co-reachable configurations of a TM (resp. LBA) can be seen as the closure of a well-chosen set of hypotheses.

We present this result in Sect. 5.1, and show in Sect. 5.2 how to instantiate it to get our complexity classes.

#### **5.1 Closure and Co-reachable States of TMs and LBAs**

**Definition 6.** *An* alternating Turing Machine *over* Σ *is a tuple* M = (Q, Q<sup>F</sup> , Γ, ι, B, Δ) *consisting of a finite set of states* Q *and final states* Q<sup>F</sup> ⊆ Q*, a finite set of states* Q*, a finite working alphabet* Γ ⊇ Σ*, an initial state* ι ∈ Q*,* B ∈ Γ *the blank symbol and a transition function* Δ : (Q \ Q<sup>F</sup> ) × Γ → P(P({L, R} × Γ × Q))*. Let* #L, #<sup>R</sup> ∈/ Γ *be fresh symbols to mark the ends of the tape, and* Γ# = Γ ∪ {#L, #<sup>R</sup>}*.*

*A* configuration *is a word* uqav = #LΓ∗QΓ <sup>+</sup>#R*, where* #<sup>L</sup> *and* #<sup>R</sup> *are special symbols not in* Γ*, meaning that the head of the TM points to the letter* a*. We denote by* C *the set of configurations of* M*. A configuration is* final *if it is of the form* #LΓ∗Q<sup>F</sup> Γ <sup>+</sup>#L*.*

*The execution of the TM* M *over input* w ∈ Σ *may be seen as a game-like scenario between two players* ∃loise *and* ∀belard *over a graph* C(C×P({L, R}× Γ × Q))*, with initial position* ιw *which proceeds as follows.*

	- ucrB#<sup>R</sup> *if* v = #<sup>R</sup> *and* d = R
	- ucrv *if* v = #<sup>R</sup> *and* d = R
	- #LrBcv *if* u = #<sup>L</sup> *and* d = L
	- u rbcv *if* u = #Ru b *and* d = L

*Given a subset of configurations* <sup>D</sup> <sup>⊆</sup> <sup>C</sup>*, we define* Attr∃loise(D) *the* <sup>∃</sup>loise *attractor for* D *as the set of configurations from which* ∃loise *may force the execution to go through* D*.*

*A* deterministic *TM* M *is one where every* Δ(q, a) ⊆ {{(d, c, r)}} *for some* (d, c, r) ∈ {L, R}×Γ ×Q *In such a case, we may identify* M *with the underlying partial function* [M] : Σ<sup>∗</sup> Q<sup>F</sup> *.*

*An* alternating linearly bounded automaton *over the alphabet* Σ *is a tuple* A = (Q, Q<sup>F</sup> , Γ, ι, Δ) *where* (Q, Q<sup>F</sup> , Γ {B}, ι, B, Δ) *is a TM that does not insert* B *symbols. This means that the head can point to* <sup>d</sup>*, and for every* X ∈ Δ(q, #d) *and* (d , a, r) ∈ X*, we have* d = d *and* a = #d*.*

*An LBA is deterministic if its underlying TM is.*

**Definition 7.** *A set of* (w ≤ w)*-hypotheses is said to be* length-preserving *if for every* (v ≤ - <sup>i</sup>∈<sup>I</sup> <sup>v</sup>i) <sup>∈</sup> <sup>H</sup>*, we have that* <sup>|</sup>v<sup>|</sup> <sup>=</sup> <sup>|</sup>vi<sup>|</sup> *for all* <sup>i</sup> <sup>∈</sup> <sup>I</sup>*.*

The following lemma generalizes a similar construction from [13].


A configuration c is *co-reachable* if ∃loise has a strategy to reach a final configuration from c. Lemma 8 shows that the set of co-reachable configurations can be seen as the closure by (w ≤ w)-hypotheses. Since we are also interested in (x ≤ x)-hypotheses, we will show that (w ≤ w) hypotheses can be transformed into letter hypotheses. Moreover, this transformation preserves the length-preserving property.

**Theorem 5.** *Let* Σ *be an alphabet,* H *be a set of (*w ≤ w*)-hypotheses over* Σ*. There exists an extended alphabet* Σ ⊇ Σ*, a set of (*x ≤ w*)-hypotheses* H *over* Σ *and a regular expression* h ∈ Exp<sup>Σ</sup> *such that the following holds for every* f ∈ Exp<sup>Σ</sup> *and* w ∈ Σ∗*.*

> w ∈ clH([f]) *if and only if* w ∈ cl<sup>H</sup>-([f + h])

*Furthermore, we guarantee the following:*

*–* (Σ , H , h) *can be computed in polynomial time from* (Σ,H)*. –* H *is length-preserving whenever* H *is.*

#### **5.2 Complexity Results**

**Lemma 9.** *If* H *is a set of length-preserving (*w ≤ w*)-hypotheses (resp. a set of (*x ≤ x*)-hypotheses),* w ∈ Σ<sup>∗</sup> *and* f ∈ ExpΣ*, deciding* KA<sup>H</sup> w ≤ f *is* EXPTIME − complete*.*

*Proof.* We actually show that our problem is complete in alternating-PSPACE (APSPACE), which enables us to conclude as EXPTIME and APSPACE coincide. First, notice that by completeness of KA<sup>H</sup> over this fragment (Proposition 2), we have KA<sup>H</sup> w ≤ f ⇔ w ∈ clH([f]). Hence, we work directly with the latter notion. It suffices to show hardness for the (x ≤ x) case and membership for the (w ≤ w) case.

Given an arbitrary alternating Turing Machine M in APSPACE there exists a polynomial <sup>p</sup> <sup>∈</sup> <sup>N</sup>[X] such that executions of <sup>M</sup> over words <sup>w</sup> are bisimilar to executions of the LBA(M) over wBp(|w|). Hence, by Lemma <sup>8</sup> and Theorem 5, the problem with (x ≤ x)-hypotheses is APSPACE-hard. Conversely, we may show that our problem with (w ≤ w)-hypotheses falls into APSPACE. On input w, the alternating algorithm first checks whether w ∈ [f] in linear time. If it is the case, it returns "yes". Otherwise, it non-deterministically picks a factorization w = uxv with x ∈ Σ<sup>∗</sup> and a hypothesis x ≤ - <sup>i</sup> yi. It then universally picks <sup>y</sup><sup>i</sup> <sup>∈</sup> <sup>Σ</sup>|x<sup>|</sup> , and replaces x by y<sup>i</sup> on the tape, so that the new tape content is w = uyiv. Then the algorithm loops back to its first step. In parallel, we keep track of the number of steps and halt by returning "no" as soon as we reach |Σ| <sup>|</sup>w<sup>|</sup> steps. This is correct because, if there is a derivation tree witnessing w ∈ clH([f]), there is one where on every path, all nodes have distinct labels, so the nondeterministic player can play according to this tree, while the universal player selects a branch.

#### **Theorem 6.** *Deciding* KA<sup>∗</sup> <sup>H</sup> *is* Π<sup>0</sup> <sup>1</sup>−complete *for (*x ≤ x*)-hypotheses.*

*Proof.* By Lemma 9 and the fact that regular expressions are in recursive bijection with natural numbers, our set is clearly Π<sup>0</sup> <sup>1</sup> . To show completeness, we effectively reduce the set of universal LBAs, which is known to be Π<sup>0</sup> <sup>1</sup>−complete, to our set of triples. Indeed, by Lemma 8, an LBA A is universal if and only if #<sup>L</sup>{ι}Σ∗#<sup>R</sup> ⊆ clH(C<sup>F</sup> ) where C<sup>F</sup> is the set of final configurations.

**Theorem 7.** *If* H *is a set of (*x ≤ w*)-hypotheses,* w ∈ Σ<sup>∗</sup> *and* f ∈ ExpΣ*, deciding* KA(∗) <sup>H</sup> <sup>w</sup> <sup>≤</sup> <sup>f</sup> *is* <sup>Σ</sup><sup>0</sup> <sup>1</sup>−complete*.*

*Proof.* As KA<sup>H</sup> is a recursively enumerable theory, our set is Σ<sup>0</sup> <sup>1</sup> . By the completeness theorem (Proposition 2), we have KA<sup>H</sup> w ≤ f ⇔ KA<sup>∗</sup> <sup>H</sup> w ≤ f ⇔ w ∈ clH([f]), so we may work directly with closure. In order to show completeness, we reduce the halting problem for Turing machines (on empty input) to this problem. Let M be a Turing machine with alphabet Σ and final state q<sup>f</sup> , and H<sup>M</sup> be the set of (w ≤ w)-hypotheses given effectively by Lemma 8. Let f = Σ∗qfΣ∗, by Lemma 8 we have M halts on empty input if and only if q<sup>0</sup> ∈ cl<sup>H</sup>M(f). Notice that hypotheses of H are of the form <sup>u</sup> <sup>≤</sup> <sup>V</sup> where <sup>u</sup> <sup>∈</sup> <sup>Θ</sup><sup>3</sup> and <sup>V</sup> <sup>⊆</sup> <sup>Θ</sup><sup>3</sup>. By Theorem 5, we can compute a set H of (x ≤ x)-hypotheses, and an expression h on an extended alphabet such that q<sup>0</sup> ∈ cl<sup>H</sup>M([f]) ⇔ q<sup>0</sup> ∈ cl<sup>H</sup>-([f + h]).

**Theorem 8.** *Deciding* KA<sup>∗</sup> <sup>H</sup> *is* Π<sup>0</sup> <sup>2</sup>−complete *for (*x ≤ w*)-hypotheses.*

*Proof.* This set is Π<sup>0</sup> <sup>2</sup> by Theorem 7. It is complete by reduction from the set of Turing Machines accepting all inputs, which is known to be Π<sup>0</sup> <sup>2</sup> . Indeed, let M be a Turing Machine on alphabet Σ with final state q<sup>f</sup> , by Lemma 8, we can compute a set of (w ≤ w)-hypotheses H<sup>M</sup> with finite language in second components such that c ∈ cl<sup>H</sup>M(c ) if and only if configuration c is reachable from c. As before, by Theorem 5, we can compute a set of letter hypotheses H with finite languages in second components, and a regular expression h on an extended alphabet, such that for any cl<sup>H</sup>- ([f + h]) ∩ Θ<sup>∗</sup> = clH([f]) for any f ∈ ExpΘ. Let C<sup>f</sup> = Σ∗qfΣ∗, we obtain that M accepts all inputs if and only if [q0Σ∗] ⊆ cl<sup>H</sup>- ([C<sup>f</sup> + h]), which achieves the proof of Π<sup>0</sup> <sup>2</sup> -completeness. **Theorem 9.** *Deciding* KA<sup>∗</sup> <sup>H</sup> *is* Π<sup>1</sup> <sup>1</sup>−complete *for* (x ≤ g)*-hypotheses (*g ∈ ExpΣ*).*

*Sketch.* It is shown in [13] that the problem is complete with hypotheses of the form H = H<sup>w</sup> ∪ {x ≤ g}, where H<sup>w</sup> is a set of length-preserving (w ≤ w) hypotheses. A slight refinement of Theorem 5 allows us to reduce this problem to hypotheses of the form x ≤ g.

#### **5.3 Undecidability of KA***<sup>H</sup>* **for Sums of Letters**

Fix an alphabet Σ, a well-behaved coding function · of Turing machines with final states {0, 1} into Σ<sup>∗</sup> and a recursive pairing function ·, · : Σ<sup>∗</sup> ×Σ<sup>∗</sup> → Σ∗. A *universal total* F : Σ<sup>∗</sup> → {0, 1} is a function such that, for every total Turing machine M and input w ∈ Σ<sup>∗</sup> we have F(M, w)=[M](w). In particular, F should be total and is not uniquely determined over codes of partial Turing machines. The next folklore lemma follows from an easy diagonal argument.

**Lemma 10.** *There is no universal total Turing machine.*

Our strategy is to show that decidability of KA<sup>H</sup> with (x ≤ x) hypotheses would imply the existence of a universal total TM. To do so, we need one additional lemma.

**Lemma 11.** *Suppose that* M = (Q, Q<sup>F</sup> , Γ, ι, B, Δ) *is a total Turing machine with final states* {0, 1} *and initial state* ι*. Let* w ∈ Σ<sup>∗</sup> *be an input word for* M*. Then there is effectively a set of length-preserving (*w ≤ w*)-hypotheses* H

*and expressions* ew, h *such that* [M](w)=1 *if and only if* KA<sup>H</sup> e<sup>w</sup> ≤ h*.*

**Theorem 10.** KA<sup>H</sup> *is undecidable for (*x ≤ x*)-hypotheses.*

*Proof.* Assume that KA<sup>H</sup> is decidable. This means that we have an algorithm A taking tuples (Σ, w, f, H), with H consisting only of sum-of-letters hypotheses and returning true when KA<sup>H</sup> w ≤ f and false otherwise. Without loss of generality, we can assume that A is total. By Theorem 5, we may even provide an algorithm A taking as input tuples (w, f, H) where H is a set of lengthpreserving (w ≤ w)-hypotheses with a similar behaviour: A returns true when KA<sup>H</sup> w ≤ f and false otherwise.

Given A , consider M defined so that [M](N , w)=[A ](ew, h, H), where the last tuple is given by Lemma 11. We show that M is a total universal Turing machine. Since such a machine cannot exist by Lemma 10, this is enough to conclude. Since A is total, so is M. For total Turing Machines N , Lemma 11 guarantees that [N ](w) = 1 if and only if [A ](ew, h, H)=[M](N , w) = 1. Since both [A ] and [M] are total with codomain {0, 1}, we really have [M](N , w) = [N ](w).

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Trees in Partial Higher Dimensional Automata**

J´er´emy Dubut1,2(B)

<sup>1</sup> National Institute of Informatics, Tokyo, Japan dubut@nii.ac.jp <sup>2</sup> Japanese-French Laboratory for Informatics, Tokyo, Japan

**Abstract.** In this paper, we give a new definition of partial Higher Dimension Automata using lax functors. This definition is simpler and more natural from a categorical point of view, but also matches more clearly the intuition that pHDA are Higher Dimensional Automata with some missing faces. We then focus on trees. Originally, for example in transition systems, trees are defined as those systems that have a unique path property. To understand what kind of unique property is needed in pHDA, we start by looking at trees as colimits of paths. This definition tells us that trees are exactly the pHDA with the unique path property modulo a notion of homotopy, and without any shortcuts. This property allows us to prove two interesting characterisations of trees: trees are exactly those pHDA that are an unfolding of another pHDA; and trees are exactly the cofibrant objects, much as in the language of Quillen's model structure. In particular, this last characterisation gives the premisses of a new understanding of concurrency theory using homotopy theory.

**Keywords:** Higher Dimensional Automata · Trees · Homotopy theories

### **1 Introduction**

Higher Dimensional Automata (HDA, for short), introduced by Pratt in [23], are a geometric model of true concurrency. Geometric, because they are defined very similarly to simplicial sets, and can be interpreted as glueings of geometric objects, here hypercubes of any dimension. Similarly to other models of concurrency much as event structures [21], asynchronous systems [1,25], or transition systems with independence [22], they model true concurrency, in the sense that they distinguish interleaving behaviours from simultaneous behaviours. In [12], van Glabbeek proved that they form the most powerful models of a hierarchy of concurrent models. In [6], Fahrenberg described a notion of bisimilarity of HDA using the general framework of open maps from [17]. If this work is very natural,

The author was supported by ERATO HASUO Metamathematics for Systems Design 27 Project (No. JPMJER1603), JST.

c The Author(s) 2019

M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 224–241, 2019. https://doi.org/10.1007/978-3-030-17127-8\_13

it is confronted with a design problem: paths (or executions) cannot be nicely encoded as HDA. Indeed, in a HDA, it is impossible to model the fact that two actions *must* be executed at the same time, or that two actions are executed at the same time but one *must* start before the other. From a geometric point of view, those impossibilities are expressed by the fact that we deal with closed cubes, that is, cubes that must contain all of their faces. Motivated by those examples, Fahrenberg, in [7], extended HDA to partial HDA, intuitively, HDA with cubes with some missing faces. If the intuition is clear, the formalisation is still complicated to achieve: the definition from [7] misses the point that faces can be not uniquely defined. This comes from the fact that Fahrenberg wanted to stick to the 'local' definition of precubical sets, that is, that cubes must satisfy some local conditions about faces. As we will show, those local equations are not enough in the partial case. Another missed point is the notion of morphisms of partial HDA: as defined in [7], the natural property that morphisms map executions to executions is not satisfied. In Sect. 2, we address those issues by giving a new definition of partial HDA in terms of lax functors. This definition, similar to the presheaf theoretic definition of HDA, avoid the issues discussed above by considering global inclusions, instead of local equations. This illustrates more clearly the intuition of partial HDA being HDA with missing faces: we coherently replace sets and total functions by sets and partial functions. From this similarity with the original definition of HDA, we can prove that it is possible to complete a partial HDA to turn it into a HDA, by adding the missing faces, and from this completion, it is possible to define a geometric realisation of pHDA (which was impossible with Fahrenberg's definition).

The geometry of Higher Dimensional Automata, and more generally, of true concurrency, has been studied since Goubault's PhD thesis [13]. Since then, numerous pieces of work relating algebraic topology and true concurrency have been achieved (for example, see the textbooks [9,14]). In particular, some attempts of defining nice homotopy theories for true concurrency (or directed topology), through the language of model structures of Quillen [24], have been made by Gaucher [10], and the author [3]. In the second part of this paper (Sects. 3, 4 and 5), we consider another point of view of this relationship between HDA and model structures. The goal is not to understand the true concurrency of HDA, that is, understanding the homotopy theory of HDA as an abstract homotopy theory, but to understand the concurrency theory of HDA. By this we mean to understand how paths (or executions) and extensions of paths can be understood using (co)fibrations (in Quillen's sense). Also, the goal is not to construct a model structure, as Quillen's axioms would fail, but to give intuitions and some preliminary formal statements toward the understanding of concurrency using homotopy theory. Using this point of view, many constructions in concurrency can be understood using the language of model structures:

– Open maps from [17] can be understood as trivial fibrations, namely weak equivalences (here, bisimulations) that have the right lifting properties with respect to some morphisms.


The main ingredient is to understand what trees are in this context. In the case of transition systems for semantics of CCS [19], synchronisation trees are those systems with exactly one path from the initial state to any state. Those trees are then much simpler to reason on, but they are still powerful enough to capture any bisimulation type: by unfolding, it is possible to canonically construct a tree from a system. The goal of Sects. 3 and 4 will be to understand how to generalise this to pHDA. In this context, it is not clear what kind of unique path property should be considered as, in general, in truly concurrent systems, we have to deal with homotopies, namely, equivalences of paths modulo permutation of independent actions. Following [4], we will first consider trees as colimits of paths. This will guide us to determine what kind of unique path property is needed: a tree is a pHDA with exactly one class of paths modulo a notion of homotopy, from the initial state to any state, and without any shortcuts. This will be proved by defining a suitable notion of unfolding of pHDA. Finally, in Sect. 5, we prove that those trees coincide exactly with the cofibrant objects, illustrating the first steps of this new understanding of concurrency, using homotopy theory.

### **2 Fixing the Definition of pHDA**

In this Section, we review the definitions of HDA (Sect. 2.1), the first one using face maps, and the second one using presheaves. In Sect. 2.2, we describe the definition of partial HDA from [7] and explain why it does not give us what we are expecting. We tackle those issues by introducing a new definition in Sect. 2.3, extending the presheaf theoretic definition, using lax functors instead of strict functors. Finally, in Sect. 2.4, we prove that HDA form a reflective subcategory of partial HDA, by constructing a completion of a partial HDA.

#### **2.1 Higher Dimensional Automata**

Higher Dimensional Automata are an extension of transition systems: they are labeled graphs, except that, in addition to vertices and edges, the graph structure also has higher dimensional data, expressing the fact that several actions can be made at the same time. Those additional data are intuitively cubes filling up interleaving: if a and b can be made at the same time, instead of having an empty square as on the left figure, with a.b and b.a as only behaviours, we have a full square as on the right figure, with any possible behaviours in-between. This requires to extend the notion of graph to add those higher dimensional cubical data: that is the notion of *precubical sets*.

**Concrete Definition of Precubical Sets.** A **precubical set** X is a collection of sets (Xn)<sup>n</sup>∈<sup>N</sup> together with a collection of functions (∂<sup>α</sup> i,n : X<sup>n</sup> −→ <sup>X</sup><sup>n</sup>−<sup>1</sup>)n>0,1≤i≤n,α∈{0,1} satisfying the local equations <sup>∂</sup><sup>α</sup> i,n ◦ <sup>∂</sup><sup>β</sup> j,n+1 <sup>=</sup> <sup>∂</sup><sup>β</sup> j,n ◦ ∂α <sup>i</sup>+1,n+1 for every α, β ∈ {0, 1}, n > 0 and 1 ≤ j ≤ i ≤ n. A **morphism of precubical sets** from X to Y is a collection of functions (f<sup>n</sup> : X<sup>n</sup> −→ Yn)<sup>n</sup>∈<sup>N</sup> satisfying the equations <sup>f</sup><sup>n</sup> ◦ <sup>∂</sup><sup>α</sup> i,n = ∂<sup>α</sup> i,n ◦ <sup>f</sup><sup>n</sup>+1 for every <sup>n</sup> <sup>∈</sup> <sup>N</sup>, 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup> and α ∈ {0, 1}. The elements of X<sup>0</sup> are called **points**, X<sup>1</sup> **segments**, X<sup>2</sup> **squares**, X<sup>n</sup> n**-cubes**. In the following, we will call **past** (resp. **future**) i**-face maps** the ∂0 i,n (resp. ∂<sup>1</sup> i,n). We denote this category of precubical sets by **pCub**.

**Precubical Sets as Presheaves.** Equivalently, **pCub** is the category of preshea-ves over the cubical category -. is the subcategory of **Set** whose objects are the sets {0, <sup>1</sup>}<sup>n</sup> for <sup>n</sup> <sup>∈</sup> <sup>N</sup> and whose morphisms are generated by the so-called **coface maps**:

$$d\_{i,n}^{\alpha} : \{0,1\}^{n-1} \longrightarrow \{0,1\}^n \quad (\beta\_1, \dots, \beta\_{n-1}) \longmapsto (\beta\_1, \dots, \beta\_{i-1}, \alpha, \beta\_i, \dots, \beta\_{n-1})$$

A precubical set is a functor X : op −→ **Set**, that is, a presheaf over -, and a morphism of precubical sets is a natural transformation.

**Higher Dimensional Automata** [11]**.** From now on, fix a set L, called the **alphabet**. We can form a precubical set also noted L such that L<sup>n</sup> = L<sup>n</sup> and the i-face maps are given by δ<sup>α</sup> <sup>i</sup> (a<sup>1</sup> ...an) = a<sup>1</sup> ...a<sup>i</sup>−<sup>1</sup>.a<sup>i</sup>+1 ...an. We can also form the following precubical set <sup>∗</sup> such that <sup>∗</sup><sup>0</sup> <sup>=</sup> {∗} and <sup>∗</sup><sup>n</sup> <sup>=</sup> <sup>∅</sup> for n > 0. A **HDA** X on L is a bialgebra ∗ → X → L in **pCub**. In other words, a HDA X is a precubical set, also noted X, together with a specified point, the **initial state**, i ∈ X<sup>0</sup> and a **labelling function** λ : X<sup>1</sup> −→ L satisfying the equations <sup>λ</sup> ◦ <sup>∂</sup><sup>0</sup> i,<sup>2</sup> <sup>=</sup> <sup>λ</sup> ◦ <sup>∂</sup><sup>1</sup> i,<sup>2</sup> for i ∈ {1, 2} (see previous figure, right). A **morphism of HDA** from X to Y is a morphism f of precubical sets from X to Y such that f0(iX) = i<sup>Y</sup> and λ<sup>X</sup> = λ<sup>Y</sup> ◦ f1. HDA on L and morphisms of HDA form a category that we denote by **HDAL**. This category can also be defined as a the double slice category ∗/**pCub**/L. Remark that we are only concerned with labelling-preserving morphisms, not general morphisms as described in [5].

#### **2.2 Original Definition of Partial Higher Dimensional Automata**

Originally [7], partial HDA are defined similarly to the concrete definition of HDA, except that the face maps can be partial functions and the local equations hold only when *both* sides are well defined. There are two reasons why it fails to give the good intuition:

– first the 'local' equations are not enough in the partial case. Imagine that we want to model a full cube c without its lower face, that is, ∂<sup>0</sup> <sup>3</sup>,<sup>3</sup> is not defined on c, and such that ∂1 <sup>1</sup>,<sup>2</sup> is undefined on ∂<sup>1</sup> <sup>1</sup>,3(c) and ∂<sup>1</sup> <sup>2</sup>,3(c), that is, we remove an edge. We cannot prove using the local equations that ∂<sup>1</sup> <sup>1</sup> ◦ <sup>∂</sup><sup>0</sup> <sup>2</sup> ◦ <sup>∂</sup><sup>1</sup> <sup>1</sup> (c) = ∂<sup>1</sup> <sup>1</sup> ◦ ∂0 <sup>2</sup> ◦ <sup>∂</sup><sup>1</sup> <sup>2</sup> (c), that is, that the vertices of the cube are uniquely defined. Indeed, to prove this equality using the local equations, you can only permute two consecutive ∂. From ∂1 <sup>1</sup> ◦ <sup>∂</sup><sup>0</sup> <sup>2</sup> ◦ <sup>∂</sup><sup>1</sup> <sup>1</sup> (c), you can:


and both faces are not defined. On the other hand, those two should be equal because the comaps d<sup>1</sup> <sup>1</sup> ◦ <sup>d</sup><sup>0</sup> <sup>2</sup> ◦ <sup>d</sup><sup>1</sup> <sup>1</sup> and d<sup>1</sup> <sup>2</sup> ◦ <sup>d</sup><sup>0</sup> <sup>2</sup> ◦ <sup>d</sup><sup>1</sup> <sup>1</sup> are equal in -, and ∂<sup>1</sup> <sup>1</sup> ◦ <sup>∂</sup><sup>0</sup> <sup>2</sup> ◦ <sup>∂</sup><sup>1</sup> 1 and ∂<sup>1</sup> <sup>1</sup> ◦ <sup>∂</sup><sup>0</sup> <sup>2</sup> ◦ <sup>∂</sup><sup>1</sup> <sup>2</sup> are both defined on c.

– secondly, the notion of morphism is not good (or at least, ambiguous). The equations <sup>f</sup><sup>n</sup> ◦ <sup>∂</sup><sup>α</sup> i,n,X = ∂<sup>α</sup> i,n,Y ◦ f<sup>n</sup>+1 hold in [7] only when *both* face maps are defined, which authorises many morphisms. For example, consider the segment I, and the 'split' segment I which is defined as I, except that no face maps are defined (geometrically, this corresponds to two points and an open segment). The

identity map from I to I is a morphism of partial precubical sets in the sense of [7], which is unexpected. A bad consequence of that is that the notion of paths in a partial HDA does not correspond to morphisms from some particular partial HDA, and paths are not preserved by morphisms, as we will see later.

#### **2.3 Partial Higher Dimensional Automata as Lax Functors**

The idea is to generalise the 'presheaf' definition of precubical sets. The problem is to deal with partial functions and when two of them should coincide. Let **pSet** be the category of sets and partial functions. A partial function f : X −→ Y can be either seen as a pair (A, f) of a subset A ⊆ X and a total function f : A −→ Y , or as a functional relation f ⊆ X × Y , that is, a relation such that for every x ∈ X, there is at most one y ∈ Y with (x, y) ∈ f. We will freely use both views in the following. For two partial maps f,g : X −→ Y , we denote by f ≡ g if and only if for every x ∈ X such that f(x) and g(x) are defined, then f(x) = g(x). Note that this is not equality, but equality on the intersection of the domains. We also write f ⊆ g if and only if f is include in g as a relation, that is, if and only if, for every x ∈ X such that f(x) is defined, then g(x) is defined and f(x) = g(x). By a **lax functor** F : C **pSet**, we mean the following data [20]:


satisfying that Fid<sup>c</sup> = idF c and F j ◦ F i ⊆ F(j ◦ i).

The point is that partial precubical sets as defined in [7] do not satisfy the second condition, while they should. In addition, this definition will authorise a square to have vertices, that is, that some ∂∂ are defined, while having no edge, that is, no ∂ defined. This may be useful to define paths as discrete traces in [8] (that we will call *shortcuts* later), that is, paths that can go directly from a point to a square for example. Observe also that if j ◦ i = j ◦ i then F j ◦ F i ≡ F j ◦ F i , which gives us the local equations from [7]. A **partial precubical set** X is then a lax functor F : op **pSet**. It becomes harder to describe explicitly what a partial precubical set is, since we cannot restrict to the ∂<sup>α</sup> <sup>i</sup> anymore. It is a collection of sets (Xn)<sup>n</sup>∈<sup>N</sup> together with a collection of *partial* functions (∂<sup>α</sup>1,...,α*<sup>k</sup>* <sup>i</sup>1<...<i*<sup>k</sup>* : <sup>X</sup><sup>n</sup>+<sup>k</sup> −→ <sup>X</sup>n) satisfying the inclusions <sup>∂</sup><sup>β</sup>1,...,β*<sup>m</sup>* <sup>j</sup>1<...<j*<sup>m</sup>* ◦ ∂<sup>α</sup>1,...,α*<sup>n</sup>* <sup>i</sup>1<...<i*<sup>n</sup>* <sup>⊆</sup> <sup>∂</sup><sup>γ</sup>1,...,γ*n*+*<sup>m</sup>* <sup>k</sup>1<...<k*n*+*<sup>m</sup>* where the k<sup>s</sup> and γ<sup>s</sup> are defined as follows. (k<sup>1</sup> <...< k<sup>n</sup>+<sup>m</sup>; γ1,...,γ<sup>n</sup>+<sup>m</sup>)=(i<sup>1</sup> < ... < in; α1,...,αn) (j<sup>1</sup> < ... < jm; β1,...,βm) where is defined by induction on n + m:


A **function-valued op-lax transformation** [20] from F : C **pSet** to G : C **pSet** is a collection (fc)<sup>c</sup>∈Ob(C) of *total* functions such that for every i : c −→ c , f<sup>c</sup>- ◦ F(i) ⊆ G(i) ◦ fc. A **morphism of partial precubical sets** from X to Y is then a function-valued op-lax transformation. In other words, this is a collection of *total* functions (f<sup>n</sup> : X<sup>n</sup> −→ Yn)<sup>n</sup>∈<sup>N</sup> satisfying the equations <sup>f</sup><sup>n</sup> ◦∂<sup>α</sup>1,...,α*<sup>k</sup>* <sup>i</sup>1<...<i*<sup>k</sup>* <sup>⊆</sup> <sup>∂</sup><sup>α</sup>1,...,α*<sup>k</sup>* <sup>i</sup>1<...<i*<sup>k</sup>* ◦f<sup>n</sup>+<sup>k</sup>. Partial precubical sets and morphisms of partial precubical sets form a category that we denote by **ppCub**. **pCub** is a full subcategory of **ppCub**. In particular, the precubical sets ∗ and L are partial precubical sets. A **partial HDA** X on L is a partial precubical set, also noted X, together with a specified point, the **initial state** i ∈ X<sup>0</sup> and a morphism of ppCub, the **labelling functions**, (λ<sup>n</sup> : <sup>X</sup><sup>n</sup> −→ <sup>L</sup><sup>n</sup>)<sup>n</sup>∈<sup>N</sup>. A **morphism of pHDA** from X to Y is a morphism f of partial precubical sets from X to Y such that f0(iX) = i<sup>Y</sup> and λ<sup>X</sup> = λ<sup>Y</sup> ◦ f. Partial HDA on L and morphisms of partial HDA form a category that we note **pHDAL**. In other words, this is the double slice category ∗/**ppCub**/L.

#### **2.4 Completion of a pHDA**

Let us describe how it is possible to construct a HDA from a pHDA X, by 'completing' X, that is, by adding the faces that are missing, and by connecting the faces that are not. Let

$$Y\_n = \{ ((i\_1 < \dots < i\_k; \alpha\_1, \dots, \alpha\_k), x) \mid x \in X\_{n+k} \land i\_k \le n+k \}$$

Y = (Yn)n∈<sup>N</sup> is intuitively the collection of all abstract faces of all cubes of X, that is, pairs of a cube and all possible ways to define a face from it. Of course, some of those are the same, since there are several ways to describe a cube as the face of some other cube. Define ∼ as the smallest equivalence relation such that:

$$\begin{aligned} \text{\$(- \quad \text{if } \partial\_{i\_1 < \ldots < i\_k}^{\alpha\_1, \ldots, \alpha\_k}(x) \text{ is defined, then} \\ \qquad ((i\_1 < \ldots < i\_k; \alpha\_1, \ldots, \alpha\_k), x) \sim (\epsilon) \end{aligned}$$

This means that, if a face of a cube exists in X, this face is identified with both abstract faces ( , ∂<sup>α</sup>1,...,α*<sup>k</sup>* <sup>i</sup>1<...<i*<sup>k</sup>* (x)) (i.e., the cube <sup>∂</sup><sup>α</sup>1,...,α*<sup>k</sup>* <sup>i</sup>1<...<i*<sup>k</sup>* (x) itself) and ((i<sup>1</sup> <...< ik; α1,...,αk), x) (i.e., the face of x, which consists of taking the (ik, αk) face, then the (i<sup>k</sup>−1, α<sup>k</sup>−<sup>1</sup>) face, and so on).

, ∂<sup>α</sup>1,...,α*<sup>k</sup>* <sup>i</sup>1<...<i*<sup>k</sup>* (x)).

– if ((i<sup>1</sup> <... < ik; α1,...,αk), x) ∼ ((j<sup>1</sup> <... < jl; β1,...,βl), y), then ((i<sup>1</sup> < ... < ik; α1,...,αk) (i, α), x) ∼ ((j<sup>1</sup> < ... < jl; β1,...,βl) (i, α), y). This means that if two abstract faces coincide, then taking both their (i, α) face gives two abstract faces that also coincide.

Let χ(X)<sup>n</sup> = Yn/ ∼ and we denote by  (i<sup>1</sup> < ... < ik; α1,...,αk), x the equivalence class of ((i<sup>1</sup> < ... < ik; α1,...,αk), x) modulo ∼. We define the i-face map as ∂<sup>α</sup> <sup>i</sup> ( (i<sup>1</sup> < ... < ik; α1,...,αk), x ) =  (i<sup>1</sup> < ... < ik; α1,...,αk) (i, α), x , the initial state as  , i and the labelling function as <sup>λ</sup>( (i<sup>1</sup> <...< ik; <sup>α</sup>1,...,αk), x ) = <sup>δ</sup><sup>α</sup><sup>1</sup> <sup>i</sup><sup>1</sup> ◦ ... ◦ <sup>δ</sup><sup>α</sup>*<sup>k</sup>* <sup>i</sup>*<sup>k</sup>* (λ(x)).

**Theorem 1.** χ *is a well-defined functor and is the left adjoint of* τ *, the injection of HDAL into pHDAL. Furthermore, HDAL is a reflective subcategory of pHDAL.*

Now, we can define the **geometric realisation** of a pHDA X as the subspace of the realisation of χ(X) consisting of points whose carrier is of the form  , x for some x ∈ X. This really corresponds to the drawings we have been using to depict pHDA until now.

### **3 Paths in Partial Higher Dimensional Automata**

Executions of HDA are defined using the notion of paths. Those paths describe the succession of starting and finishing of actions in a HDA. For example, a HDA can start an action then start another at the same time, and finish the two actions. This sequence is then not just a sequence of 1-dimensional transitions, since some actions can be made at the same time, but a sequence of hypercubes corresponding to the evolution of the state of the system. We will formalise this idea in Sect. 3.2, and we will see in particular that those paths can be encoded in the category **pHDA<sup>L</sup>** (while it is not possible in the category **HDAL**) as morphisms from particular pHDA, called path shapes. In Sect. 3.1, let us first start by recalling the general framework of [17].

#### **3.1 Path Category, Open Maps, Coverings**

In the general framework of [17], we start with a category M of systems, together with a subcategory P of execution shapes. For example, keep in mind the case where M is the category of transition systems and P is the full subcategory of finite linear systems. One interesting remark about this case is that executions of a given systems are in bijective correspondance with morphisms from a finite linear system to this given system. This means that to reason about behaviours of such systems, it is enough to reason about morphisms and execution shapes.

This idea was formalised by describing precisely which morphisms are witnesses for the existence of a bisimulation between systems. This description uses right lifting properties: we say that a morphism f : X −→ Y has the **right lifting property with respect to** g : X −→ Y if for every x : X −→ X and y : Y −→ Y such that f ◦ x = y ◦ g, there exists θ : Y −→ X such that x = θ ◦ g and f ◦ θ = y. For example, let us assume that f is a

morphism of transition systems and that X and Y are finite linear systems. Then x (resp. y) is the same as an execution in X (resp. Y ), and f ◦ x = y ◦ g means that the execution y is a extension of the image of the execution x by f. The right lifting property means that the longer execution y of Y can be lifted to a longer execution θ of X, that is, θ is an extension of x and the image of θ by f is y. This property of lifting longer executions is precisely the property needed on a morphism to make its graph relation a bisimulation. They are also very similar to morphisms of coalgebras [16]. We call P**-open** (or simply open when P is clear), a morphism that has the right lifting property with respect to every morphism in P. From open maps, it is possible to describe similarity and bismilarity as the existence of a span of morphisms/open maps, and many kinds of bisimilarities can be captured in this way [17]. An open map is said to be a P**-covering** (or simply covering) if furthermore the lifts in the right lifting properties are unique. Being a covering is a very strong requirement, as they correspond to partial unfolding of a system.

#### **3.2 Encoding Paths in pHDA**

In this section, we describe the classical notion of execution of HDA from [12], extended to partial HDA in [7], defined using the notion of path. We then show

that those executions can be encoded as an execution shapes subcategory, as in the general framework of [17], proving in particular that paths are in bijective correspondance with a class of morphisms. A **path** π of a HDA X is a sequence i = x<sup>0</sup> j1,α<sup>1</sup> −−−→ x<sup>1</sup> j2,α<sup>2</sup> −−−→ ... <sup>j</sup>*n*,α*<sup>n</sup>* −−−−→ x<sup>n</sup> where x<sup>k</sup> ∈ X, j<sup>k</sup> > 0 and α<sup>k</sup> ∈ {0, 1} are such that for every k:


This definition can easily be extended to pHDA, by requiring that the jk-face maps are defined on x<sup>k</sup> or x<sup>k</sup>−<sup>1</sup>. A natural property of executions and morphisms is that morphisms map executions to executions. This is the case here (while it is not for [7], e.g., the split segment):

**Proposition 1.** *If* f : X −→ Y *is a map of pHDA and if* π = x<sup>0</sup> j1,α<sup>1</sup> −−−→ x<sup>1</sup> j2,α<sup>2</sup> −−−→ ... <sup>j</sup>*n*,α*<sup>n</sup>* −−−−→ <sup>x</sup><sup>n</sup> *is a path in* <sup>X</sup>*, then* <sup>π</sup> <sup>=</sup> <sup>f</sup>(x0) <sup>j</sup>1,α<sup>1</sup> −−−→ <sup>f</sup>(x1) <sup>j</sup>2,α<sup>2</sup> −−−→ ... <sup>j</sup>*n*,α*<sup>n</sup>* −−−−→ f(xn) *is a path in* Y *.*

One advantage of considering pHDA instead of HDA is that paths can be encoded in pHDA, which is not really possible in HDA. It is done as follows. A **spine** σ is a sequence (0, )=(d0, w0) <sup>j</sup>1,α<sup>1</sup> −−−→ (d1, w1) <sup>j</sup>2,α<sup>2</sup> −−−→ ... <sup>j</sup>*n*,α*<sup>n</sup>* −−−−→ (dn, wn) where <sup>j</sup><sup>k</sup> <sup>&</sup>gt; 0, <sup>d</sup><sup>k</sup> <sup>∈</sup> <sup>N</sup>, <sup>w</sup><sup>k</sup> <sup>∈</sup> <sup>L</sup><sup>d</sup>*<sup>k</sup>* and <sup>α</sup><sup>k</sup> ∈ {0, <sup>1</sup>} are such that:

– if α<sup>k</sup> = 0, then d<sup>k</sup>−<sup>1</sup> = d<sup>k</sup> − 1, δ<sup>j</sup>*<sup>k</sup>* (wk) = w<sup>k</sup>−<sup>1</sup> and j<sup>k</sup> ≤ dk, – if α<sup>k</sup> = 1, then d<sup>k</sup> = d<sup>k</sup>−<sup>1</sup> − 1, δ<sup>j</sup>*<sup>k</sup>* (w<sup>k</sup>−<sup>1</sup>) = w<sup>k</sup> and j<sup>k</sup> ≤ d<sup>k</sup>−<sup>1</sup>.

A path π has a underlying spine σ<sup>π</sup> by mapping x<sup>k</sup> to the pair of its dimension and its label. A spine σ induces a pHDA Bσ as follows:

	- if <sup>α</sup><sup>k</sup> = 0, then <sup>∂</sup><sup>0</sup> <sup>j</sup>*<sup>k</sup>* (k) = k − 1,
	- if <sup>α</sup><sup>k</sup> = 1, then <sup>∂</sup><sup>1</sup> <sup>j</sup>*<sup>k</sup>* (k − 1) = k,
	- <sup>∂</sup><sup>β</sup>1,...,β*<sup>m</sup>* <sup>j</sup>1<...<j*<sup>m</sup>* ◦ <sup>∂</sup><sup>α</sup>1,...,α*<sup>n</sup>* <sup>i</sup>1<...<i*<sup>n</sup>* <sup>⊆</sup> <sup>∂</sup><sup>γ</sup>1,...,γ*n*+*<sup>m</sup>* <sup>k</sup>1<...<k*n*+*m*, for (k1,...,k<sup>n</sup>+<sup>m</sup>; γ1,...,γ<sup>n</sup>+<sup>m</sup>)=(i1,...,in; α1,...,αn) (j1,...,jm; β1,...,βm).


By a **path shape**, we mean such a pHDA Bσ. The set **Spine<sup>L</sup>** of spines can be partially ordered by prefix. B can then be extended to an embedding from **Spine<sup>L</sup>** to **pHDAL**. We note **PS<sup>L</sup>** the image of this embedding, i.e., the full sub-category of path shapes.

**Proposition 2.** *There is a bijection between paths in a pHDA* X *and morphisms of pHDA from a path shape to* X*.*

Again, this is not true with the definition of morphisms from [7] (e.g., the split segment). As an example, the red path π above corresponds to a morphism from the path shape Bσ to X.

#### **4 Trees and Unfolding in pHDA**

In this section, we introduce our notion of trees. Following [4], we consider trees as colimits (or glueings of paths). Section 4.1 is dedicated to proving that those colimits actually exist, by giving an explicit construction of those. From this explicit construction, we will describe the kind of unique path properties that are satisfied by those trees in Sect. 4.2. Starting by showing, that the strict unicity of path fails, we then describe a notion of homotopy, the confluent homotopy, which is weaker than the one from [12], for which every tree has the property that there is exactly one homotopy class of paths form the initial state to any state. We will also see that, because the face maps of trees are defined in a local way, they do not have any shortcuts, that is, paths that 'skip' dimensions, for example, going from a point to a square without going through a segment. Finally, in Sect. 4.3, we will prove that those two properties – the unicity of paths modulo confluent homotopy, and the non-existence of shortcuts – completely characterise trees. This proof will use a suitable notion of unfolding of pHDA, showing furthermore that trees form a coreflective subcategory of pHDA.

#### **4.1 Trees, as Colimits of Paths in pHDA**

In this section, we give an explicit construction of colimits of diagrams with values in path shapes. Those will be our first definition of trees in pHDA, following [4]. Let D : C −→ **PS<sup>L</sup>** be a small diagram with values in **PSL**, that is, a functor from C to **PSL**. Let us use some notations: for every object u of C, Du = Bσ<sup>u</sup> with σ<sup>u</sup> = (d<sup>u</sup> <sup>0</sup> , w<sup>u</sup> <sup>0</sup> ) <sup>j</sup>*<sup>u</sup>* <sup>1</sup> ,α*<sup>u</sup>* <sup>1</sup> −−−−→ (d<sup>u</sup> <sup>1</sup> , w<sup>u</sup> <sup>1</sup> ) <sup>j</sup>*<sup>u</sup>* <sup>2</sup> ,α*<sup>u</sup>* <sup>2</sup> −−−−→ ... <sup>j</sup>*<sup>u</sup> lu* ,α*<sup>u</sup> lu* −−−−−→ (d<sup>u</sup> <sup>l</sup>*<sup>u</sup>* , w<sup>u</sup> <sup>l</sup>*<sup>u</sup>* ). The definition of the colimit col D will be in two steps. The first step consists in putting all the paths Du side-by-side, and in glueing them together, along the morphisms Df, for every morphism f of C. This is done as follows. Define (Xn)<sup>n</sup>∈<sup>N</sup> to be:

– <sup>X</sup><sup>0</sup> <sup>=</sup> {(u, k) <sup>|</sup> <sup>u</sup> ∈ C, k <sup>≤</sup> <sup>l</sup><sup>u</sup> <sup>∧</sup> <sup>d</sup><sup>u</sup> <sup>k</sup> = 0}{ }, – <sup>X</sup><sup>n</sup> <sup>=</sup> {(u, k) <sup>|</sup> <sup>u</sup> ∈ C, k <sup>≤</sup> <sup>l</sup><sup>u</sup> <sup>∧</sup> <sup>d</sup><sup>u</sup> <sup>k</sup> = n}.

We quotient X<sup>n</sup> by the smallest equivalence relation ∼ (for inclusion) such that:

– for every u, (u, 0) ∼ , – if i : u −→ v ∈ C, and if k ≤ lu, lv, then (u, k) ∼ (v, k).

We denote by Y<sup>n</sup> the quotient Xn/ ∼, and by [u, k] the equivalence class of (u, k) modulo ∼.

At this stage, we still do not have the colimit because it is not possible to define the face maps. Let us consider the following example.

A, B and C are path shapes, and we would like to compute their pushout. The expected outcome is D, since we must identify the three squares by the previous construction. The problem is that the previous construction does not identify β<sup>1</sup> and β2. Those two must be identified because they are both the top right corner of the same square (after identification). We hence need to quotient a little more to be able to define the face maps, as follows. Define Z<sup>n</sup> to be the quotient of Y<sup>n</sup> by the smallest equivalence relation ≈ such that if there are two sequences u0,...,u<sup>l</sup> and v0,...,v<sup>l</sup> such that:


The initial state is and the labelling <sup>λ</sup> : col <sup>D</sup> −→ <sup>L</sup> maps u, k to <sup>w</sup><sup>u</sup> k . **Proposition 3.** *col* D *is the colimit of* D *in pHDAL*

By **tree** we mean any pHDA that is the colimit of a diagram with values in path shapes. We denote by **Tr<sup>L</sup>** the full subcategory of trees.

#### **4.2 The Unique Path Properties of Trees**

**Failure of the Unicity of Paths.** Let us consider the pushout square above again. In particular, the pHDA on the bottom-right corner is a tree, by definition. However, there are two paths from α to β (in red and blue). This actually comes from the fact that we needed to identify β<sup>1</sup> and β<sup>2</sup> to be able to define the face maps. This means that trees do not have the unique path property.

**Confluent Homotopy.** A careful reader may have observed that the only difference between the two previous paths is that some future faces are swapped. Actually, this is the only obstacle for the unicity of paths for trees: there is a unique path modulo equivalence of paths that permutes arrows of the form ,<sup>1</sup> −−→. That is what we call **confluent homotopy**. This confluent homotopy will be defined by restricting the elementary homotopies of [12] to be of only one type out of the four possible, which means our notion of homotopy makes fewer paths equivalent than the one from [12].

We say that a path π = x<sup>0</sup> j1,α<sup>1</sup> −−−→ x<sup>1</sup> j2,α<sup>2</sup> −−−→ ... <sup>j</sup>*n*,α*<sup>n</sup>* −−−−→ x<sup>n</sup> is **elementary confluently homotopic** to a path π = x 0 j- 1,α- <sup>1</sup> −−−→ x 1 j- 2,α- <sup>2</sup> −−−→ ... <sup>j</sup>- *n*,α- *<sup>n</sup>* −−−−→ x <sup>n</sup>, and denote by π ch π , if and only if there are 0 <s<t ≤ n such that:

– for all k<s or k ≥ t, x<sup>k</sup> = x k, – for all k<s or k>t, j<sup>k</sup> = j <sup>k</sup> and α<sup>k</sup> = α k, – for all s ≤ k ≤ t, α<sup>k</sup> = α <sup>k</sup> = 1, – (js, αs) ... (jt, αt)=(j <sup>s</sup>, α <sup>s</sup>) ... (j <sup>t</sup>, α <sup>t</sup>).

We denote by ∼ch, and call **confluent homotopy**, the reflexive transitive closure of ch.

**Lemma 1.** *If* X *is a tree, then for every element (of any dimension)* x *of* X*, there is exactly one path modulo confluent homotopy from the initial state to* x*.*

**Shortcuts.** The face maps of path shapes and of the colimits we computed in Sect. 4.1 are of a very particular form: we start by defining the ∂<sup>α</sup> <sup>j</sup> and we extend this definition to general ∂<sup>α</sup>1,...,α*<sup>n</sup>* <sup>j</sup>1<...<j*<sup>n</sup>* . In a way, they are locally defined, and then extended to higher face maps. This means in particular that, in addition to having unique paths modulo confluent homotopy, they also do not have any 'shortcut'. A possible shortcut can be defined as a generalisation of paths, in which we allow to make transitions that go, for example, from a point to a square or to a cube, not only to segments, a shortcut being such a possible shortcut which is not confluently homotopic to a path. Those shortcuts may occur in a pHDA, even if it has the unique path property. Concretely, by **shortcut** we mean the following situation: the face ∂α1,...,α*<sup>n</sup>* <sup>i</sup>1<...<i*<sup>n</sup>* (x) is defined, but there is no sequence (j1; <sup>β</sup>1) ... (jn; <sup>β</sup>n)=(i<sup>1</sup> < ... < in; <sup>α</sup>1,...,αn) such that <sup>∂</sup>α*<sup>n</sup>* <sup>j</sup>*<sup>n</sup>* ◦ ... ◦ <sup>∂</sup>α<sup>1</sup> <sup>j</sup><sup>1</sup> (x) is defined. By local-definedness of the face maps:

#### **Lemma 2.** *Trees do not have any shortcuts.*

**Trees.** We say that a pHDA **has the unique path property modulo confluent homotopy** if it has no shortcut, and there is exactly one class of paths modulo confluent homotopy from the initial state to any state. Given such a pHDA X and an element x of X, by **depth of** x we mean the length of a path from the initial state to x in X. Since homotopic paths have the same length, this is uniquely defined. We deduce from the previous discussions that:

### **Proposition 4.** *Trees have unique path property modulo confluent homotopy.*

In the following, we will prove the converse: trees, defined as colimits of path shapes are exactly those pHDA that have the unique path property modulo confluent homotopy. This will be done by proving that such a pHDA X is isomorphic to its unfolding. A question that occurs now is the following. Much as the general framework of [4], trees are colimits of paths. Everything tends to work well when those trees have a nice property, which we called **accessibility**, intuitively, that the colimit process do not 'create' paths. This property is actually deeply related to the unicity of paths. Since this unicity fails in the case of pHDA, accessibility fails too. However, an accessibility modulo confluent homotopy holds: the colimit process in pHDA do not create confluent homotopy classes of paths.

#### **4.3 Trees Are Unfoldings**

We are now constructing our unfolding U(X) of a pHDA X by giving an explicit definition, similar to [6,11], and proving that this is a tree. We will prove that there is a covering unf<sup>X</sup> : U(X) −→ X, which in particular means that the unfolding U(X) is **PSL**-bisimilar (in the general sense of [17]) to X, and that this covering is actually an isomorphism when X has the unique path property modulo confluent homotopy.

**Unfolding of a pHDA.** Let us start with a few notations. Given a path π = x0 j1,α<sup>1</sup> −−−→ x<sup>1</sup> j2,α<sup>2</sup> −−−→ ... <sup>j</sup>*n*,α*<sup>n</sup>* −−−−→ x<sup>n</sup> we note e(π) = xn, l(π) = n and π−<sup>k</sup> = x0 j1,α<sup>1</sup> −−−→ x<sup>1</sup> j2,α<sup>2</sup> −−−→ ... <sup>j</sup>*n*−*k*,α*n*−*<sup>k</sup>* −−−−−−−→ x<sup>n</sup>−<sup>k</sup>. Given a pHDA X, its **unfolding** is the following pHDA:

	- <sup>∂</sup><sup>1</sup> <sup>i</sup> (α)=[<sup>π</sup> i,<sup>1</sup> −−→ <sup>∂</sup><sup>1</sup> <sup>i</sup> (e(π))], for any <sup>π</sup> <sup>∈</sup> <sup>α</sup> such that <sup>∂</sup><sup>1</sup> <sup>i</sup> (e(π)) is defined,
	- <sup>∂</sup><sup>0</sup> <sup>i</sup> (α)=[π−<sup>1</sup>] for any π ∈ α such that π = π−<sup>1</sup> i,0 −−→ e(π),

Following ideas from [4] again, the unfolding can be seen as the glueing of all possible executions of a system, but with care needed to handle confluent homotopy. Concretely:

**Proposition 5.** *The unfolding of a pHDA is a tree.*

We can also define unf<sup>X</sup> : U(X) −→ X as the function that maps [π] to e(π).

**Proposition 6.** *unf*<sup>X</sup> *is a covering, and so,* U(X) *is PSL-bisimilar to* X*.*

**The Unique Path Property Characterises Trees.** When X has exactly one class of paths modulo confluent homotopy from the initial state to any state, it is possible to define a function η<sup>X</sup> : X −→ U(X) that maps any element x of X to the unique confluent homotopy class to x. When furthermore X does not have shortcuts, then η is actually a morphism of pHDA.

**Proposition 7.** *When* X *has the unique path property modulo confluent homotopy, then* η<sup>X</sup> *is the inverse of unf*X*. In particular,* X *is a tree.*

Together with Proposition 4, this implies the following:

**Theorem 2.** *Trees are exactly the pHDA that have the unique path property modulo confluent homotopy.*

Another consequence is that this isomorphism η<sup>X</sup> is actually natural (in the categorical sense) and is part of an adjunction, which implies that trees form a coreflective subcategory of pHDA:

**Corollary 1.** U *extends to a functor, which is the right adjoint of the embedding* ι : *TrL* −→ *pHDAL. Furthermore, this is a coreflection.*

#### **5 Cofibrant Objects**

Cofibrant objects are another type of 'simple objects', coming from homotopy theory, more particularly the language of model categories from [24]. Those cofibrant objects are those whose unique morphism from the initial object is a cofibration. Intuitively (intuition which holds at least in cofibrantly generated model structures [15]), this means that cofibrant objects are those objects constructed from 'nothing', using only very basic constructions (generators of cofibrations). In the case of the classical model structure on topological spaces (Kan-Quillen), those spaces are those constructed from the empty space by adding 'cells', which produces what is called CW-complexes. In this section, we want to mimic this idea with trees: trees are those pHDA constructed from an initial state by only extending paths. We also want to emphasize that much as CW-complexes gives a kind of homotopy type of a space, trees gives a concurrency type of a pHDA, in the sense that there is a canonical way to produce an equivalent cofibrant object out of any object, which is called the **cofibrant replacement** in homotopy theory. In concurrency theory, this is the unfolding.

# **5.1 Cofibrant Objects in pHDA<sup>L</sup>**

! Following the language of model structures from [24], we say that a pHDA X is **cofibrant** if for every **PSL**-open morphism f : Y −→ Z and every morphism g : X −→ Z, there is a morphism h : X −→ Y , such that f ◦ h = g. That is, a partial HDA X is cofibrant if and only if every **PSL**-open morphism has the right lifting property with respect to the unique morphism from ∗ to X.

#### **5.2 Cofibrant Objects Are Exactly Trees**

In this section, we would like to prove the following:

#### **Theorem 3.** *The cofibrant objects are exactly trees.*

∗ U(X) X X ! id*<sup>X</sup>* ! h Let us start by giving the idea of the proof of the fact that cofibrant objects are trees. By Proposition 6, unf<sup>X</sup> is a covering, so is open. This means that for every cofibrant object X, there is a morphism h : X −→ U(X) such that unf<sup>X</sup> ◦ h = idX, that is, X is a retract of its unfolding. Since we know that the unfolding is a tree by Proposition 5, it is enough to observe the following:

**Lemma 3.** *A retract of a tree is a tree.*

Intuitively, a pHDA is the retract of a tree only when it is obtain by retracting branches. This can only produce a tree. For the converse:

**Proposition 8.** *A tree is a cofibrant object. Furthermore, if* f : Y −→ Z *is a covering, then the lift* h : X −→ Y *is unique.*

The lift h is constructed by induction as follows. We define X<sup>n</sup> as the restriction of X to elements whose depth is smaller than n, and the face maps ∂<sup>α</sup>1,...,α*<sup>m</sup>* <sup>j</sup>1<...<j*m*(x) are defined if and only if <sup>∂</sup><sup>α</sup>1,...,α*<sup>m</sup>* <sup>j</sup>1<...<j*m*(x) is defined in X and belongs to Xn. We then construct h<sup>n</sup> : X<sup>n</sup> −→ Y using the unique path property modulo confluent homotopy, in a natural way (in the categorical meaning), i.e., such that h<sup>n</sup> ◦ κ<sup>n</sup> = h<sup>n</sup>−<sup>1</sup>, where κ<sup>n</sup> : X<sup>n</sup>−<sup>1</sup> −→ X<sup>n</sup> is the inclusion. h is then the inductive limit of those hn. This proof can be seen as a small object argument.

#### **5.3 The Unfolding Is Universal**

As an application of the previous theorem, we would like to prove that the unfolding is universal. As in the case of covering spaces in algebraic topology, a covering corresponds to a partial unrolling of a system, in the sense that we can unroll some loops or even partially unroll a loop (imagine for example executing a few steps of a while-loop). In this sense, we can describe the fact that a covering unrolls more than another one, and that, an unfolding is a complete unrolling: since the domain is a tree, it is impossible to unroll more. Actually, much as the

unf

topological and the groupoidal cases (see [18] for example), unfoldings are the only such maximal unrollings among coverings: they are initial among coverings, that is why we call them 'universal'. In a way, this says that our definition of unfolding is the only reasonable one. Concretely, we say that a **PSL**-covering is **universal** if its domain is a tree.

**Corollary 2.** *If* f : Y −→ X *is a universal covering, then for every covering* g : Z −→ X *there is a unique map* h : Y −→ X *such that* f = g ◦ h*. Furthermore,* h *is itself a covering. Consequently, the universal covering is unique up-to isomorphism, and is given by the unfolding.*

This whole story is similar to the universal covering of a topological space: just replace pHDA by spaces and trees by simply-connected spaces [2].

# **6 Conclusion and Future Work**

In this paper, we have given a cleaner definition of partial precubical sets and partial Higher Dimensional Automata, as they really correspond to collections of cubes with missing faces. From this categorical definition, we derived that pHDA can be completed, giving rise to a geometric realisation. We also describe the first premisses of a homotopy theory of the concurrency of pHDA where the cofibrant objects are trees, and replacement is the unfolding. As a future work, we could look at wider class of paths, typically allowing shortcuts as paths, or introducing general homotopies in the path category, which is possible because we can encode those inside the category of pHDA. Another direction would be to continue the description of this homotopy theory, to see if it corresponds to some kind of Quillen's model structure, or at least to some weaker version (e.g., category of cofibrant objects).

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **The Bernays-Sch¨onfinkel-Ramsey Class of Separation Logic on Arbitrary Domains**

Mnacho Echenim<sup>1</sup>, Radu Iosif<sup>2</sup>, and Nicolas Peltier1(B)

<sup>1</sup> Univ. Grenoble Alpes, CNRS, LIG, 38000 Grenoble, France Nicolas.peltier@imag.fr

<sup>2</sup> Univ. Grenoble Alpes, CNRS, VERIMAG, 38000 Grenoble, France

**Abstract.** This paper investigates the satisfiability problem for Separation Logic with k record fields, with unrestricted nesting of separating conjunctions and implications, for prenex formulæ with quantifier prefix ∃<sup>∗</sup>∀<sup>∗</sup>. In analogy with first-order logic, we call this fragment Bernays-Sch¨onfinkel-Ramsey Separation Logic [BSR(SL*<sup>k</sup>*)]. In contrast to existing work in Separation Logic, in which the universe of possible locations is assumed to be infinite, both finite and infinite universes are considered. We show that, unlike in first-order logic, the (in)finite satisfiability problem is undecidable for BSR(SL*<sup>k</sup>*). Then we define two non-trivial subsets thereof, that are decidable for finite and infinite satisfiability respectively, by controlling the occurrences of universally quantified variables within the scope of separating implications, as well as the polarity of the occurrences of the latter. Beside the theoretical interest, our work has natural applications in program verification, for checking that constraints on the shape of a data-structure are preserved by a sequence of transformations.

### **1 Introduction**

Separation Logic [10,14], or SL, is a logical framework used in program verification to describe properties of the dynamically allocated memory, such as topologies of data structures (lists, trees), (un)reachability between pointers, etc. In a nutshell, given an integer k <sup>≥</sup> 1, the logic SL<sup>k</sup> is obtained from the firstorder theory of a finite partial function h : U-U<sup>k</sup> called a *heap*, by adding two substructural connectives: (i) the *separating conjunction* <sup>φ</sup><sup>1</sup> <sup>∗</sup> <sup>φ</sup><sup>2</sup>, that asserts a split of the heap into disjoint heaps satisfying <sup>φ</sup><sup>1</sup> and <sup>φ</sup><sup>2</sup> respectively, and (ii) the *separating implication* or *magic wand* <sup>φ</sup><sup>1</sup> −∗ <sup>φ</sup><sup>2</sup>, stating that each extension of the heap by a heap satisfying <sup>φ</sup><sup>1</sup> must satisfy <sup>φ</sup><sup>2</sup>. Intuitively, <sup>U</sup> is the universe of possible of memory locations (cells) and k is the number of record fields in each memory cell.

The separating connectives ∗ and −∗ allow concise definitions of program semantics, via weakest precondition calculi [10] and easy-to-write specifications of recursive linked data structures (e.g. singly- and doubly-linked lists, trees with linked leaves and parent pointers, etc.), when higher-order inductive definitions are added [14]. Investigating the decidability and complexity of the satisfiability problem for fragments of SL is of theoretical and practical interest. In this paper, we consider prenex SL formulæ with prefix ∃∗∀∗. In analogy with first-order logic with equality and uninterpreted predicates [12], we call this fragment Bernays-Sch¨onfinkel-Ramsey SL [BSR(SLk)].

As far as we are aware, all existing work on SL assumes that the universe (set of available locations) is countably infinite. This assumption is not necessarily realistic in practice since the available memory is usually finite, although the bound depends on the hardware and is not known in advance. The finite universe hypothesis is especially useful when dealing with bounded memory issues, for instance checking that the execution of a program satisfies its postcondition, provided that there are sufficiently many available memory cells. In this paper we consider both the finite and infinite satisfiability problems. We show that both problems are undecidable for BSR(SL<sup>k</sup>) (unlike in first-order logic) and that they become PSPACE-complete under some additional restrictions, related to the occurrences of the magic wand and universal variables:


Reasoning on finite domains is more difficult than on infinite ones, due to possibility of asserting cardinality constraints on unallocated cells, which explains that the latter condition is more restrictive than the former one. Actually, the finite satisfiability problem is undecidable even if there is only one positive occurrence of a −∗ with no variable within the scope of −∗. These results establish sharp decidability frontiers within BSR(SL<sup>k</sup>).

Undecidability is shown by reduction from BSR first-order formulæ with two monadic function symbols. To establish the decidability results, we first show that every quantifier-free SL formula can be transformed into an equivalent boolean combination of formulæ of some specific patterns, called *test formulæ*. This result is interesting in itself, since it provides a precise and intuitive characterization of the expressive power of SL: it shows that separating connectives can be confined to a small set of test formulæ. Afterward, we show that such test formulæ can be transformed into first-order formulæ. If the above conditions (1) or (2) are satisfied, then the obtained first-order formulæ are in the BSR class, which ensures decidability. The PSPACE upper-bound relies on a careful analysis of the maximal size of the test formulæ. The analysis reveals that, although the boolean combination of test formulæ is of exponential size, its components (e.g., the conjunctions in its dnf) are of polynomial size and can be enumerated in polynomial space. For space reasons, full details and proofs are given in a technical report [8].

**Applications.** Besides theoretical interest, our work has natural applications in program verification. Indeed, purely universal SL formulæ are useful to express pre- or post-conditions asserting "local" constraints on the shape of the data structures manipulated by a program. Consider the atomic proposition x → (y<sup>1</sup>,...,yk) which states that the value of the heap at x is the tuple (y<sup>1</sup>,...,yk) and there is no value, other than x, in the domain of h. With this in mind, the following formula describes a well-formed doubly-linked list:

$$\forall x\_1, x\_2, x\_3, x\_4, x\_5 \; . \; x\_1 \mapsto (x\_2, x\_3) \; \* \; x\_2 \mapsto (x\_4, x\_5) \; \* \; \top \Rightarrow x\_5 \approx x\_1 \land \neg x\_3 \approx x\_4 \tag{1}$$

Such constraints could also be expressed by using inductively defined predicates, unfortunately checking satisfiability of SL formulæ, even of very simple fragments with no occurrence of −∗ in the presence of user-defined inductive predicates is undecidable, unless some rather restrictive conditions are fulfilled [9]. In contrast, checking entailment between two universal formulæ boils down to checking the satisfiability of a BSR(SL<sup>k</sup>) formula, which can be done thanks to the decidability results in our paper.

The separating implication (magic wand) seldom occurs in such shape constraints. However, it is useful to describe the dynamic transformations of the data structures, as in the following Hoare-style axiom, giving the weakest precondition of <sup>∀</sup>**<sup>u</sup>** . ψ with respect to redirecting the i-th record field of <sup>x</sup> to <sup>z</sup> [10]:

$$\{\mathbf{x} \mapsto (y\_1, \dots, y\_k) \ast \left[\mathbf{x} \mapsto (y\_1, \dots, y\_{i-1}, \mathbf{z}, y\_{i+1}, \dots, y\_k) \ast \forall \mathbf{u} \ . \ . \ \psi\right] \} \ge \mathbf{i} := \mathbf{z} \cdot \left\{\forall \mathbf{u} \ . \ . \psi\right\}$$

It is easy to check that the precondition is equivalent to the formula <sup>∀</sup>**<sup>u</sup>** . <sup>x</sup> → (y1,...,y<sup>k</sup>) <sup>∗</sup> [<sup>x</sup> → (y1,...,y<sup>i</sup>−1, <sup>z</sup>, y<sup>i</sup>+1,...,y<sup>k</sup>) −∗ ψ] because, although hoisting universal quantifiers outside of the separating conjunction is unsound in general, this is possible here due to the special form of the left-hand side <sup>x</sup> → (y1,...,y<sup>i</sup>−1, <sup>z</sup>,...,y<sup>k</sup>) which unambiguously defines a single heap cell. Therefore, checking that <sup>∀</sup>**<sup>u</sup>** . ψ is an invariant of the program statement <sup>x</sup>.<sup>i</sup> := <sup>z</sup> amounts to checking that the formula <sup>∀</sup>u.ψ ∧ ∃**<sup>u</sup>** . <sup>¬</sup>[<sup>x</sup> → (y1,...,y<sup>k</sup>) <sup>∗</sup> (<sup>x</sup> → (y<sup>1</sup>,...,y<sup>i</sup>−<sup>1</sup>, <sup>z</sup>,...,y<sup>k</sup>) −∗ ψ)] is unsatisfiable. Because the magic wand occurs negated, this formula falls into a decidable class defined in the present paper, for both finite and infinite satisfiability. The complete formalization of this deductive program verification technique and the characterization of the class of programs for which it is applicable is outside the scope of the paper and is left for future work.

**Related Work.** In contrast to first-order logic for which the decision problem has been thoroughly investigated [1], only a few results are known for SL. For instance, the problem is undecidable in general and PSPACE-complete for quantifier-free formulæ [4]. For k = 1, the problem is also undecidable, but it is PSPACE-complete if in addition there is only one quantified variable [6] and decidable but nonelementary if there is no magic wand [2]. In particular, we have also studied the prenex form of SL<sup>1</sup> [7] and found out that it is decidable and nonelementary, whereas BSR(SL<sup>1</sup>) is PSPACE-complete. In contrast, in this paper we show that undecidability occurs for BSR(SL<sup>k</sup>), for <sup>k</sup> <sup>≥</sup> 2.

Expressive completeness results exist for quantifier-free SL<sup>1</sup> [2,11] and for SL<sup>1</sup> with one and two quantified variables [5,6]. There, the existence of equivalent boolean combinations of test formulæ is shown implicitly, using a finite enumeration of equivalence classes of models, instead of an effective transformation. Instead, here we present an explicit equivalence-preserving transformation of quantifier-free SL<sup>k</sup> into boolean combinations of test formulæ, and translate the latter into first-order logic. Further, we extend the expressive completeness result to finite universes, with additional test formulæ asserting cardinality constraints on unallocated cells.

Another translation of quantifier-free SL<sup>k</sup> into first-order logic with equality has been described in [3]. There, the small model property of quantifier-free SL<sup>k</sup> [4] is used to bound the number of first-order variables to be considered and the separating connectives are interpreted as first-order quantifiers. The result is an equisatisfiable first-order formula. This translation scheme cannot be, however, directly applied to BSR(SL<sup>k</sup>), which does not have a small model property, being moreover undecidable. Theory-parameterized versions of BSR(SL<sup>k</sup>) have been shown to be undecidable, e.g. when integer linear arithmetic is used to reason about locations, and claimed to be PSPACE-complete for countably infinite and finite unbounded location sorts, with no relation other than equality [13]. In the present paper, we show that this claim is wrong, and draw a precise chart of decidability for both infinite and finite satisfiability of BSR(SL<sup>k</sup>).

#### **2 Preliminaries**

**Basic Definitions.** Let <sup>Z</sup><sup>∞</sup> <sup>=</sup> <sup>Z</sup> ∪ {∞} and <sup>N</sup><sup>∞</sup> <sup>=</sup> <sup>N</sup> ∪ {∞}, where for each n <sup>∈</sup> <sup>Z</sup> we have n <sup>+</sup> <sup>∞</sup> <sup>=</sup> <sup>∞</sup> and n < <sup>∞</sup>. For a countable set S we denote by ||S|| ∈ <sup>N</sup><sup>∞</sup> the cardinality of <sup>S</sup>. Let Var be a countable set of variables, denoted as x, y, z and U be a sort. Vectors of variables are denoted by **<sup>x</sup>**, **<sup>y</sup>**, etc. A *function symbol* f has #(f) <sup>≥</sup> 0 arguments of sort U and a sort σ(f), which is either the boolean sort Bool or U. If #(f) = 0, we call f <sup>a</sup> *constant*. We use <sup>⊥</sup> and for the boolean constants false and true, respectively. First-order (FO) terms t and formulæ ϕ are defined by the following grammar:

$$t := x \mid f(\underbrace{t, \dots, t}\_{\#(f)}) \qquad \varphi := \bot \mid \top \mid t \approx t \mid p(\underbrace{t, \dots, t}\_{\#(p)} \mid \varphi \land \varphi \mid \neg \varphi \mid \exists x \; . \varphi \}$$

where x <sup>∈</sup> Var, f and p are function symbols, σ(f) = U and σ(p) = Bool. We write <sup>ϕ</sup><sup>1</sup> <sup>∨</sup> <sup>ϕ</sup><sup>2</sup> for <sup>¬</sup>(¬ϕ<sup>1</sup> ∧ ¬ϕ<sup>2</sup>), <sup>ϕ</sup><sup>1</sup> <sup>→</sup> <sup>ϕ</sup><sup>2</sup> for <sup>¬</sup>ϕ<sup>1</sup> <sup>∨</sup> <sup>ϕ</sup><sup>2</sup>, <sup>ϕ</sup><sup>1</sup> <sup>↔</sup> <sup>ϕ</sup><sup>2</sup> for <sup>ϕ</sup><sup>1</sup> <sup>→</sup> <sup>ϕ</sup><sup>2</sup> <sup>∧</sup>ϕ<sup>2</sup> <sup>→</sup> <sup>ϕ</sup><sup>1</sup> and <sup>∀</sup>x.ϕ for ¬∃x . <sup>¬</sup>ϕ. The size of a formula <sup>ϕ</sup>, denoted as size(ϕ), is the number of symbols needed to write it down. Let var(ϕ) be the set of variables that occur free in ϕ, i.e. not in the scope of a quantifier. A *sentence* ϕ is a formula where var(ϕ) = <sup>∅</sup>.

First-order formulæ are interpreted over FO-structures (called structures, when no confusion arises) <sup>S</sup> = (U, <sup>s</sup>, <sup>i</sup>), where <sup>U</sup> is a countable set, called the *universe*, the elements of which are called *locations*, <sup>s</sup> : Var - U is a mapping of variables to locations, called a *store* and <sup>i</sup> interprets each function symbol f by a function f<sup>i</sup> : <sup>U</sup>#(f) <sup>→</sup> <sup>U</sup>, if <sup>σ</sup>(f) = <sup>U</sup> and <sup>f</sup><sup>i</sup> : <sup>U</sup>#(f) → {⊥<sup>i</sup> , i } if σ(f) = Bool. A structure (U, <sup>s</sup>, <sup>i</sup>) is *finite* when ||U|| ∈ <sup>N</sup> and *infinite* otherwise. We write S |<sup>=</sup> ϕ iff ϕ is true when interpreted in <sup>S</sup>. This relation is defined recursively on the structure of ϕ, as usual. When S |<sup>=</sup> ϕ, we say that <sup>S</sup> is a *model* of ϕ. A formula is [finitely] *satisfiable* when it has a [finite] model. We write <sup>ϕ</sup><sup>1</sup> <sup>≡</sup> <sup>ϕ</sup><sup>2</sup> when (U, <sup>s</sup>, <sup>i</sup>) <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>1</sup> <sup>⇔</sup> (U, <sup>s</sup>, <sup>i</sup>) <sup>|</sup><sup>=</sup> <sup>ϕ</sup><sup>2</sup>, for every structure (U, <sup>s</sup>, <sup>i</sup>).

The Bernays-Sch¨onfinkel-Ramsey fragment of FO, denoted by BSR(FO), is the set of sentences <sup>∃</sup>x<sup>1</sup> ... <sup>∃</sup>xn∀y<sup>1</sup> ... <sup>∀</sup>y<sup>m</sup> . ϕ, where <sup>ϕ</sup> is a quantifier-free formula in which all function symbols f of arity #(f) > 0 have sort σ(f) = Bool.

**Separation Logic.** Let <sup>k</sup> be a strictly positive integer. The logic SL<sup>k</sup> is the set of formulæ generated by the grammar:

$$\varphi := \bot \mid \top \mid \mathsf{emp} \mid x \approx y \mid x \mapsto (y\_1, \dots, y\_k) \mid \varphi \land \varphi \mid \neg \varphi \mid \varphi \ast \varphi \mid \varphi \dashrightarrow \varphi \mid \exists x \dots \varphi$$

where x, y, y1,...,y<sup>k</sup> <sup>∈</sup> Var. The connectives <sup>∗</sup> and −∗ are respectively called the *separating conjunction* and *separating implication* (*magic wand*). We write <sup>ϕ</sup><sup>1</sup> <sup>ϕ</sup><sup>2</sup> for <sup>¬</sup>(ϕ<sup>1</sup> −∗ ¬ϕ<sup>2</sup>) ( is called *septraction*). The size and set of free variables of an SL<sup>k</sup> formula <sup>ϕ</sup> are defined as for first-order formulæ.

Given an SL<sup>k</sup> formula <sup>φ</sup> and a subformula <sup>ψ</sup> of <sup>φ</sup>, we say that <sup>ψ</sup> *occurs at polarity* p ∈ {−1, <sup>0</sup>, <sup>1</sup>} iff one of the following holds: (i) φ <sup>=</sup> ψ and p = 1, (ii) <sup>φ</sup> <sup>=</sup> <sup>¬</sup>φ<sup>1</sup> and <sup>ψ</sup> occurs at polarity <sup>−</sup><sup>p</sup> in <sup>φ</sup><sup>1</sup>, (iii) <sup>φ</sup> <sup>=</sup> <sup>φ</sup><sup>1</sup> <sup>∧</sup> <sup>φ</sup><sup>2</sup> or <sup>φ</sup> <sup>=</sup> <sup>φ</sup><sup>1</sup> <sup>∗</sup> <sup>φ</sup><sup>2</sup>, and <sup>ψ</sup> occurs at polarity <sup>p</sup> in <sup>φ</sup><sup>i</sup>, for some <sup>i</sup> = 1, 2, or (iv) <sup>φ</sup> <sup>=</sup> <sup>φ</sup><sup>1</sup> −∗φ<sup>2</sup> and either <sup>ψ</sup> is a subformula of <sup>φ</sup><sup>1</sup> and <sup>p</sup> = 0, or <sup>ψ</sup> occurs at polarity <sup>p</sup> in <sup>φ</sup><sup>2</sup>. A polarity of 1, <sup>0</sup> or −1 is also referred to as positive, neutral or negative, respectively. Note that our notion of polarity is slightly different than usual, because the antecedent of a separating implication is of neutral polarity while the antecedent of an implication is usually of negative polarity. This is meant to strengthen upcoming decidability results, see Remark 2.

SL<sup>k</sup> formulæ are interpreted over SL-*structures* <sup>I</sup> = (U, <sup>s</sup>, <sup>h</sup>), where <sup>U</sup> and <sup>s</sup> are as before and <sup>h</sup> : <sup>U</sup> *fin* U<sup>k</sup> is a finite partial mapping of locations to k-tuples of locations, called a *heap*. As before, a structure (U, <sup>s</sup>, <sup>h</sup>) is finite when ||U|| ∈ <sup>N</sup> and infinite otherwise. We denote by dom(h) the domain of the heap <sup>h</sup> and by ||h|| ∈ <sup>N</sup> the cardinality of dom(h). Two heaps <sup>h</sup><sup>1</sup> and <sup>h</sup><sup>2</sup> are *disjoint* iff dom(h1) ∩ dom(h2) = ∅, in which case h<sup>1</sup> h<sup>2</sup> denotes their union. A heap h is an *extension* of <sup>h</sup> by <sup>h</sup> iff <sup>h</sup> <sup>=</sup> <sup>h</sup> <sup>h</sup>. The relation (U, <sup>s</sup>, <sup>h</sup>) <sup>|</sup><sup>=</sup> ϕ is defined inductively, as follows:


The semantics of equality, boolean and first-order connectives is the usual one. Satisfiability, entailment and equivalence are defined for SL<sup>k</sup> as for FO formulæ.

The Bernays-Sch¨onfinkel-Ramsey fragment of SLk, denoted by BSR(SLk), is the set of sentences <sup>∃</sup>x<sup>1</sup> ... <sup>∃</sup>xn∀y<sup>1</sup> ... <sup>∀</sup>y<sup>m</sup> . φ, where <sup>φ</sup> is a quantifier-free SL<sup>k</sup> formula. Since there is no function symbol of arity greater than zero in SLk, there is no restriction, other than the form of the quantifier prefix defining BSR(SLk).

# **3 Test Formulæ for SL***<sup>k</sup>*

We define a small set of SL<sup>k</sup> patterns of formulæ, possibly parameterized by a positive integer, called *test formulæ*. These patterns capture properties related to allocation, points-to relations in the heap and cardinality constraints.

**Definition 1.** *The following patterns are called* test formulæ*:*

$$\begin{array}{ll} x \hookrightarrow \mathbf{y} \stackrel{\text{def}}{=} x \mapsto \mathbf{y} \ast \top & \quad |U| \ge n \stackrel{\text{def}}{=} \top \multimap |h| \ge n, \ n \in \mathbb{N} \\ \mathsf{alloc}(x) \stackrel{\text{def}}{=} x \mapsto \underbrace{(x, \ldots, x)}\_{k \text{ times}} \twoheadrightarrow \bot & \quad |h| \ge |U| - n \stackrel{\text{def}}{=} |h| \ge n + 1 \star \bot, n \in \mathbb{N} \\ x \approx y & \qquad |h| \ge n \stackrel{\text{def}}{=} \begin{cases} |h| \ge n - 1 \star \neg \mathsf{emp}, \; if \, n > 0 \\ \top, & \qquad \text{if } n = 0 \\ \bot, & \qquad \text{if } n = \infty \end{cases} \end{array}$$

*where* x, y <sup>∈</sup> Var*,* **<sup>y</sup>** <sup>∈</sup> Var<sup>k</sup> *and* <sup>n</sup> <sup>∈</sup> <sup>N</sup><sup>∞</sup> *is a positive integer or* <sup>∞</sup>*.*

The semantics of test formulæ is very natural: x <sup>→</sup> **<sup>y</sup>** means that x points to vector **<sup>y</sup>**, alloc(x) means that x is allocated, and the arithmetic expressions are interpreted as usual, where <sup>|</sup>h<sup>|</sup> and <sup>|</sup>U<sup>|</sup> respectively denote the number of allocated cells and the number of locations (possibly ∞). Formally:

**Proposition 1.** *Given an* SL*-structure* (U, <sup>s</sup>, <sup>h</sup>)*, the following equivalences hold, for all variables* x, y1,...,y<sup>k</sup> <sup>∈</sup> Var *and integers* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*:*

(U, s, h) |= x -→ **y** ⇔ h(s(x)) = s(**y**) (U, s, h) |= |h|≥|U| − n ⇔ ||h|| ≥ ||U|| − n (U, s, h) |= |U| ≥ n ⇔ ||U|| ≥ n (U, s, h) |= |h| ≥ n ⇔ ||h|| ≥ n (U, s, h) |= alloc(x) ⇔ s(x) ∈ dom(h)

Not all atoms of SL<sup>k</sup> are test formulæ, for instance x → **<sup>y</sup>** and emp are not test formulæ. However, by Proposition 1, we have the equivalences x → **<sup>y</sup>** <sup>≡</sup> x <sup>→</sup> **<sup>y</sup>** ∧ ¬|h| ≥ 2 and emp ≡ ¬|h| ≥ 1. Note that, for any n <sup>∈</sup> <sup>N</sup>, the test formulæ <sup>|</sup>U| ≥ n and <sup>|</sup>h|≥|U| − n are trivially true and false respectively, if the universe is infinite. We write t<u for <sup>¬</sup>(t <sup>≥</sup> u).

We need to introduce a few notations useful to describe upcoming transformations in a concise and precise way. A *literal* is a test formula or its negation. Unless stated otherwise, we view a conjunction T of literals as a set<sup>1</sup> and we use the same symbol to denote both a set and the formula obtained by conjoining the elements of the set. The equivalence relation <sup>x</sup> <sup>≈</sup><sup>T</sup> <sup>y</sup> is defined as <sup>T</sup> <sup>|</sup><sup>=</sup> <sup>x</sup> <sup>≈</sup> <sup>y</sup> and we write <sup>x</sup> ≈<sup>T</sup> <sup>y</sup> for <sup>T</sup> <sup>|</sup><sup>=</sup> <sup>¬</sup><sup>x</sup> <sup>≈</sup> <sup>y</sup>. Observe that <sup>x</sup> ≈<sup>T</sup> <sup>y</sup> is not the complement of <sup>x</sup> <sup>≈</sup><sup>T</sup> <sup>y</sup>. For a set <sup>X</sup> of variables, <sup>|</sup>X<sup>|</sup> <sup>T</sup> is the number of equivalence classes of <sup>≈</sup><sup>T</sup> in <sup>X</sup>.

<sup>1</sup> The empty set is thus considered to be true.

**Definition 2.** *A variable* x *is* allocated *in an* SL*-structure* <sup>I</sup> *iff* I |<sup>=</sup> alloc(x)*. For a set of variables* X <sup>⊆</sup> Var*, let* alloc(X) def = <sup>x</sup>∈<sup>X</sup> alloc(x) *and* nalloc(X) def = <sup>x</sup>∈<sup>X</sup> <sup>¬</sup>alloc(x)*. For a set* <sup>T</sup> *of literals, let:*

$$\begin{array}{lcl}\mathsf{av}(T) \stackrel{\scriptstyle \mathsf{at}}{=} & \left\{ x \in \mathsf{Var} \mid x \approx\_T x', \ T \cap \{ \mathsf{alloc}(x'), x' \to \mathsf{y} \mid \mathsf{y} \in \mathsf{Var}^k \} \neq \emptyset \right\} \\ \mathsf{nv}(T) \stackrel{\scriptstyle \mathsf{at}}{=} & \left\{ x \in \mathsf{Var} \mid x \approx\_T x', \ \neg \mathsf{alloc}(x') \in T \right\} \\ \mathsf{fp}\_X(T) \stackrel{\scriptstyle \mathsf{at}}{=} & T \cap \{ \mathsf{alloc}(x), \neg \mathsf{alloc}(x), x \hookrightarrow \mathsf{y}, \ \neg x \hookrightarrow \mathsf{y} \mid x \in X, \mathsf{y} \in \mathsf{Var}^k \} \end{array}$$

*We let* #a(T) def <sup>=</sup> <sup>|</sup>av(T)<sup>|</sup> <sup>T</sup> *be the number of equivalence classes of* ≈<sup>T</sup> *containing variables allocated in every model of* T *and* #n(X, T) def <sup>=</sup> <sup>|</sup>X <sup>∩</sup> nv(T)<sup>|</sup> <sup>T</sup> *be the number of equivalence classes of* <sup>≈</sup><sup>T</sup> *containing variables from* <sup>X</sup> *that are not allocated in any model of* <sup>T</sup>*. We also let* fpa(T) def <sup>=</sup> fpav(T)(T)*.*

Intuitively, av(T) [nv(T)] is the set of variables that must be [are never] allocated in every [any] model of T, and fpX(T) is the *footprint* of <sup>T</sup> relative to the set X <sup>⊆</sup> Var, i.e. the set of formulæ describing allocation and points-to relations over variables from X. For example, if T <sup>=</sup> {x <sup>≈</sup> z, alloc(x),¬alloc(y),¬z <sup>→</sup> **<sup>y</sup>**}, then av(T) = {x, z}, nv(T) = {y}, fpa(T) = {alloc(x),¬z <sup>→</sup> **<sup>y</sup>**} and fpnv(T)(T) = {¬alloc(y)}.

#### **3.1 From Test Formulæ to FO**

The introduction of test formulæ (Definition 1) is motivated by the reduction of the (in)finite satisfiability problem for quantified boolean combinations thereof to the same problem for FO. The reduction is devised in such a way that the obtained formula is in the BSR class, if possible. Given a quantified boolean combination of test formulæ φ, the FO formula <sup>τ</sup>(φ) is defined by induction on the structure of φ:

$$\begin{array}{lll} \mathsf{r}(|h|\geq n) \stackrel{\mathrm{def}}{=} \mathfrak{a}\_{n} & \qquad \mathsf{r}(|U|\geq n) \stackrel{\mathrm{def}}{=} \mathfrak{b}\_{n} \\ \mathsf{r}(|h|\geq |U|-n) \stackrel{\mathrm{def}}{=} \mathsf{r}\mathfrak{c}\_{n+1} & \qquad \mathsf{r}(\neg\phi\_{1}) \stackrel{\mathrm{def}}{=} \mathsf{r}(\phi\_{1}) \\ \mathsf{r}(x \hookrightarrow \mathsf{y}) \stackrel{\mathrm{def}}{=} \mathfrak{p}(x, y\_{1}, \dots, y\_{k}) & \mathsf{r}(\mathsf{alloc}(x)) \stackrel{\mathrm{def}}{=} \exists y\_{1} \dots \exists y\_{k} \ . \ \exists y\_{1} . \ \mathsf{p}(x, y\_{1}, \dots, y\_{k}) \\ \mathsf{r}(\phi\_{1} \land \phi\_{2}) \stackrel{\mathrm{def}}{=} \mathfrak{r}(\phi\_{1}) \land \mathsf{r}(\phi\_{2}) & \qquad \mathsf{r}(\exists x \ . \ \phi\_{1}) \stackrel{\mathrm{def}}{=} \exists x \ . \ \mathsf{r}(\phi\_{1}) \\ \mathsf{r}(x \approx y) \stackrel{\mathrm{def}}{=} x \approx y \end{array}$$

where <sup>p</sup> is a (<sup>k</sup> + 1)-ary function symbol of sort Bool and <sup>a</sup><sup>n</sup>, <sup>b</sup><sup>n</sup> and <sup>c</sup><sup>n</sup> are constants of sort Bool, for all n <sup>∈</sup> <sup>N</sup>. These function symbols are related by the following axioms, where <sup>u</sup><sup>n</sup>, <sup>v</sup><sup>n</sup> and <sup>w</sup><sup>n</sup> are constants of sort <sup>U</sup>, for all n > 0:

$$\begin{matrix} P: \forall x \forall \mathbf{y} \forall \mathbf{y}' \quad \mathfrak{p}(x, \mathbf{y}) \land \mathfrak{p}(x, \mathbf{y}') \to \bigwedge\_{i=1}^{k} y\_i \approx y'\_i\\ A\_0: \mathfrak{a}\_0 \qquad A\_n: \left\{ \begin{array} \exists \mathbf{y} \ . \mathbf{a}\_n \rightarrow \mathfrak{a}\_{n-1} \land \mathfrak{p}(\mathbf{u}\_n, \mathbf{y}) \land \bigwedge\_{i=1}^{n-1} \neg \mathbf{u}\_i \approx \mathbf{u}\_n\\ \wedge \forall x \forall \mathbf{y} \ . \neg \mathfrak{a}\_n \land \mathfrak{p}(x, \mathbf{y}) \to \bigvee\_{i=1}^{n-1} x \approx \mathbf{u}\_i \end{array} \right\} \\ B\_0: \mathbf{b}\_0 \qquad B\_n: \left\{ \begin{array} \mathbf{b}\_n \rightarrow \mathbf{b}\_{n-1} \wedge \bigwedge\_{i=1}^{n-1} \neg \mathbf{u}\_i \approx \mathbf{u}\_n\\ \wedge \forall x \ . \neg \mathbf{b}\_n \rightarrow \bigvee\_{i=1}^{n-1} x \approx \mathbf{v}\_i \end{array} \right\} \\ C\_0: \mathbf{c}\_0 \qquad C\_n: \forall \mathbf{y} \ . \mathbf{c}\_n \rightarrow \mathbf{c}\_{n-1} \land \neg \mathbf{p}(\mathbf{w}\_n, \mathbf{y}) \land \bigwedge\_{i=1}^{n-1} \neg \mathbf{w}\_n \approx \mathbf{w}\_i \end{matrix}$$

Intuitively, <sup>p</sup> encodes the heap and <sup>a</sup><sup>n</sup> (resp. <sup>b</sup>n) is true iff there are at least <sup>n</sup> cells in the domain of the heap (resp. in the universe), namely <sup>u</sup><sup>1</sup>,..., <sup>u</sup><sup>n</sup> (resp. <sup>v</sup><sup>1</sup>,..., <sup>v</sup>n). If <sup>c</sup><sup>n</sup> is true, then there are at least <sup>n</sup> locations <sup>w</sup><sup>1</sup>,..., <sup>w</sup><sup>n</sup> outside of the domain of the heap (free), but the converse does not hold. The <sup>C</sup><sup>n</sup> axioms do not state the equivalence of <sup>c</sup><sup>n</sup> with the existence of at least <sup>n</sup> free locations, because such an equivalence cannot be expressed in BSR(FO)<sup>2</sup>. As a consequence, the transformation preserves sat-equivalence only if the formulæ <sup>|</sup>h|≥|U| − n occur only at negative polarity (see Lemma 1, Point 2). If the domain is infinite then this problem does not arise since the formulæ <sup>|</sup>h|≥|U|−n are always false.

**Definition 3.** *For a quantified boolean combination of test formulæ* φ*, we let* <sup>N</sup> (φ) *be the maximum integer* n *occurring in a test formula* θ *of the form* <sup>|</sup>h| ≥ n*,* <sup>|</sup>U| ≥ n, *or* <sup>|</sup>h|≥|U|−n *from* φ *and define* <sup>A</sup>(φ) def <sup>=</sup> {P}∪{A<sup>i</sup>} N(φ) <sup>i</sup>=0 ∪ {B<sup>i</sup>} N(φ) <sup>i</sup>=0 ∪ {C<sup>i</sup>} N(φ)+1 <sup>i</sup>=0 *as the set of axioms related to* <sup>φ</sup>*.*

The relationship between φ and <sup>τ</sup>(φ) is stated below.

**Lemma 1.** *Let* φ *be a quantified boolean combination of test formulæ. The following hold, for any universe* U *and any store* s*:*


The translation of alloc(x) introduces existential quantifiers depending on x. For instance, <sup>∀</sup>x . alloc(x) is translated as <sup>∀</sup>x∃y<sup>1</sup> ... <sup>∃</sup>y<sup>k</sup> . <sup>p</sup>(x, y1,...,y<sup>k</sup>), which lies outside of the BSR(FO) fragment. Because upcoming decidability results (Theorem 2) require that <sup>τ</sup>(φ) be in BSR(FO), we end this section by delimiting a fragment of SL<sup>k</sup> whose translation falls into BSR(FO).

**Lemma 2.** *Given an* SL<sup>k</sup> *formula* <sup>ϕ</sup> <sup>=</sup> <sup>∀</sup>z<sup>1</sup> ... <sup>∀</sup>z<sup>m</sup> . φ*, where* <sup>φ</sup> *is a boolean combination of test formulæ containing no positive occurrence of* alloc(z<sup>i</sup>) *for any* i <sup>∈</sup> [1, m]*,* <sup>τ</sup>(ϕ) *is equivalent (up to transformation into prenex form) to a* BSR(FO) *formula with the same constants and free variables as* <sup>τ</sup>(ϕ)*.*

Intuitively, if a formula alloc(x) occurs negatively then the quantifiers <sup>∃</sup>y<sup>1</sup> ... <sup>∃</sup>y<sup>k</sup> added when translating alloc(x) can be transformed into universal ones by transformation into nnf, and if x is not universal then they may be shifted at the root of the formula since <sup>y</sup><sup>1</sup>,...,y<sup>k</sup> depend only on <sup>x</sup>. In both cases, the quantifier prefix ∃<sup>∗</sup>∀<sup>∗</sup> is preserved.

<sup>2</sup> The converse of <sup>C</sup>*n*: <sup>∀</sup>x . (¬c*<sup>n</sup>* ∧ ∀**<sup>y</sup>** . <sup>¬</sup>p(x, **<sup>y</sup>**)) <sup>→</sup> *<sup>n</sup>*−<sup>1</sup> *<sup>i</sup>*=1 x ≈ w*<sup>i</sup>* is not in BSR(FO).

# **4 From Quantifier-Free SL***<sup>k</sup>* **to Test formulæ**

This section states the expressive completeness result of the paper, namely that any quantifier-free SL<sup>k</sup> formula is equivalent, on both finite and infinite models, to a boolean combination of test formulæ. Starting from a quantifier-free SL<sup>k</sup> formula ϕ, we define a set μ(ϕ) of conjunctions of test formulæ and their negations, called *minterms*, such that ϕ <sup>≡</sup> <sup>M</sup>∈μ(ϕ) <sup>M</sup>. Although the number of minterms in μ(ϕ) is exponential in the size of ϕ, checking the membership of a given minterm M in μ(ϕ) can be done in PSPACE. Together with the translation of minterms into FO (Sect. 3.1), this fact is used to prove PSPACE membership of the two decidable fragments of BSR(SL<sup>k</sup>), defined next (Sect. 5.2).

#### **4.1 Minterms**

<sup>A</sup> *minterm* M is a set (conjunction) of literals containing: exactly one literal <sup>|</sup>h| ≥ min<sup>M</sup> and one literal <sup>|</sup>h<sup>|</sup> <sup>&</sup>lt; maxM, where min<sup>M</sup> <sup>∈</sup> <sup>N</sup> ∪ {|U| − <sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>} and max<sup>M</sup> <sup>∈</sup> <sup>N</sup>∞∪{|U| − <sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>}, and at most one literal of the form <sup>|</sup>U| ≥ <sup>n</sup>, respectively <sup>|</sup>U<sup>|</sup> < n.

A minterm may be viewed as an abstract description of a heap. The conditions are for technical convenience only and are not restrictive. For instance, tautological test formulæ of the form <sup>|</sup>h| ≥ 0 and/or <sup>|</sup>h<sup>|</sup> < <sup>∞</sup> may be added if needed so that the first condition holds. If <sup>M</sup> contains two literals <sup>t</sup> <sup>≥</sup> <sup>n</sup><sup>1</sup> and <sup>t</sup> <sup>≥</sup> <sup>n</sup><sup>2</sup> with <sup>n</sup><sup>1</sup> < n<sup>2</sup> and <sup>t</sup> ∈ {|h|, <sup>|</sup>U|} then <sup>t</sup> <sup>≥</sup> <sup>n</sup><sup>1</sup> is redundant and can be removed – and similarly if <sup>M</sup> contains literals <sup>|</sup>h|≥|U| − <sup>n</sup><sup>1</sup> and <sup>|</sup>h|≥|U| − n<sup>2</sup>. Heterogeneous constraints are merged by performing a case split on the value of <sup>|</sup>U|. For example, if M contains both <sup>|</sup>h|≥|U| − 4 and <sup>|</sup>h| ≥ 1, then the first condition prevails if <sup>|</sup>U| ≥ 5 yielding the equivalence disjunction: <sup>|</sup>h| ≥ <sup>1</sup>∧|U<sup>|</sup> < <sup>5</sup>∨|h|≥|U| −4∧|U| ≥ 5. Thus, in the following, we assume that any conjunction of literals can be transformed into a disjunction of minterms [8].

**Definition 4.** *Given a minterm* M*, we define the sets:*

$$\begin{cases} M^{a} \stackrel{\text{def}}{=} M \cap \{ x \approx y, \neg x \approx y \mid x, y \in \mathsf{Var} \} & M^{a} \stackrel{\text{def}}{=} M \cap \{ \mathsf{alloc}(x), \neg \mathsf{alloc}(x) \mid x \in \mathsf{Var} \} \\\ M^{u} \stackrel{\text{def}}{=} M \cap \{ |U| \ge n, |U| < n \mid n \in \mathbb{N} \} & M^{p} \stackrel{\text{def}}{=} M \cap \{ x \hookrightarrow \mathsf{y}, \neg x \hookrightarrow \mathsf{y} \mid x, \mathsf{y} \in \mathsf{Var}^{k+1} \} \end{cases}$$

Thus, M <sup>=</sup> M<sup>e</sup> <sup>∪</sup>M<sup>u</sup> <sup>∪</sup>M<sup>a</sup> <sup>∪</sup>M<sup>p</sup> ∪ {|h| ≥ min<sup>M</sup>, <sup>|</sup>h<sup>|</sup> < maxM}, for each minterm M. Given a set of variables X <sup>⊆</sup> Var, a minterm M is (1) *E-complete* for X iff for all x, y <sup>∈</sup> X exactly one of x <sup>≈</sup> y <sup>∈</sup> M, <sup>¬</sup>x <sup>≈</sup> y <sup>∈</sup> M holds, and (2) *A-complete* for X iff for each x <sup>∈</sup> X exactly one of alloc(x) <sup>∈</sup> M, <sup>¬</sup>alloc(x) <sup>∈</sup> M holds.

For a literal , we denote by its complement, i.e. <sup>θ</sup> def <sup>=</sup> <sup>¬</sup>θ and <sup>¬</sup>θ def <sup>=</sup> θ, where θ is a test formula. Let <sup>M</sup> be the minterm obtained from <sup>M</sup> by replacing each literal with its complement. The *complement closure* of M is cc(M) def <sup>=</sup> M <sup>∪</sup> M. Two tuples **<sup>y</sup>**, **<sup>y</sup>** <sup>∈</sup> Var<sup>k</sup> are *M-distinct* if <sup>y</sup><sup>i</sup> ≈<sup>M</sup> <sup>y</sup> <sup>i</sup>, for some i <sup>∈</sup> [1, k]. Given a minterm M that is E-complete for var(M), its *points-to closure* is pc(M) def = ⊥ if there exist literals x <sup>→</sup> **<sup>y</sup>**, x <sup>→</sup> **<sup>y</sup>** <sup>∈</sup> <sup>M</sup> such that <sup>x</sup> <sup>≈</sup><sup>M</sup> <sup>x</sup> and **<sup>y</sup>**, **<sup>y</sup>** are <sup>M</sup>distinct, and pc(M) def <sup>=</sup> M, otherwise. Intuitively, pc(M) is <sup>⊥</sup> iff M contradicts the fact that the heap is a partial function<sup>3</sup>. The *domain closure* of M is dc(M) def = ⊥ if either min<sup>M</sup> <sup>=</sup> <sup>n</sup><sup>1</sup> and max<sup>M</sup> <sup>=</sup> <sup>n</sup><sup>2</sup> for some <sup>n</sup><sup>1</sup>, n<sup>2</sup> <sup>∈</sup> <sup>Z</sup> such that <sup>n</sup><sup>1</sup> <sup>≥</sup> <sup>n</sup><sup>2</sup>, or min<sup>M</sup> <sup>=</sup> <sup>|</sup>U| − <sup>n</sup><sup>1</sup> and max<sup>M</sup> <sup>=</sup> <sup>|</sup>U| − <sup>n</sup><sup>2</sup>, where <sup>n</sup><sup>2</sup> <sup>≥</sup> <sup>n</sup><sup>1</sup>; and otherwise:

$$\begin{array}{c} \mathsf{dcc}(M) \stackrel{\text{def}}{=} M \cup \left\{ |U| \ge \left\lceil \sqrt[k]{\max\_{x \in \mathsf{av}(M)} (\delta\_x(M) + 1)} \right\rceil \right\} \\ \cup \left\{ |U| \ge n\_1 + n\_2 + 1 \; | \; \min\_M = n\_1, \max\_M = |U| - n\_2, n\_1, n\_2 \in \mathbb{N} \right\} \\ \cup \left\{ |U| < n\_1 + n\_2 \; | \; \min\_M = |U| - n\_1, \max\_M = n\_2, n\_1, n\_2 \in \mathbb{N} \right\} \end{array}$$

where δx(M) is the number of pairwise M-distinct tuples **<sup>y</sup>** for which there exists <sup>¬</sup>x <sup>→</sup> **<sup>y</sup>** <sup>∈</sup> <sup>M</sup> such that <sup>x</sup> <sup>≈</sup><sup>M</sup> <sup>x</sup> . Intuitively, dc(M) asserts that min<sup>M</sup> <sup>&</sup>lt; max<sup>M</sup> and that the domain contains enough elements to allocate all cells. Essentially, given a structure (U, <sup>s</sup>, <sup>h</sup>), if <sup>h</sup>(x) is known to be defined and distinct from n pairwise distinct vectors of locations **<sup>v</sup>**<sup>1</sup>,..., **<sup>v</sup>**n, then necessarily at least <sup>n</sup> + 1 vectors must exist. Since there are ||U||<sup>k</sup> vectors of length <sup>k</sup>, we must have ||U||<sup>k</sup> <sup>≥</sup> <sup>n</sup> + 1, hence ||U|| ≥ <sup>√</sup>*<sup>k</sup>* <sup>n</sup> + 1. For instance, if <sup>M</sup> <sup>=</sup> {¬x <sup>→</sup> <sup>y</sup><sup>i</sup>, alloc(x), y<sup>i</sup> ≈ <sup>y</sup><sup>j</sup> <sup>|</sup> i, j <sup>∈</sup> [1, n], i <sup>=</sup> j}, then it is clear that M is unsatisfiable if there are less than n locations, since x cannot be allocated in this case.

**Definition 5.** *A minterm* M *is* footprint-consistent<sup>4</sup> *if for all* x, x <sup>∈</sup> Var *and* **<sup>y</sup>**, **<sup>y</sup>** <sup>∈</sup> Var<sup>k</sup>*, such that* <sup>x</sup> <sup>≈</sup><sup>M</sup> <sup>x</sup> *and* <sup>y</sup><sup>i</sup> <sup>≈</sup><sup>M</sup> <sup>y</sup> <sup>i</sup> *for all* <sup>i</sup> <sup>∈</sup> [1, k]*, we have (1) if* alloc(x) <sup>∈</sup> M *then* <sup>¬</sup>alloc(x ) ∈ M*, and (2) if* x <sup>→</sup> **<sup>y</sup>** <sup>∈</sup> M *then* <sup>¬</sup>alloc(x ),¬x <sup>→</sup> **<sup>y</sup>** ∈ M*.*

We are now ready to define a boolean combination of test formulæ that is equivalent to <sup>M</sup><sup>1</sup> <sup>∗</sup> <sup>M</sup><sup>2</sup>, where <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> are minterms satisfying a number of additional conditions. Let npto(M1, M<sup>2</sup>) def = (M<sup>1</sup> <sup>∩</sup> <sup>M</sup><sup>2</sup>) ∩ {¬x <sup>→</sup> **<sup>y</sup>** <sup>|</sup> <sup>x</sup> ∈ av(M<sup>1</sup> <sup>∪</sup> <sup>M</sup><sup>2</sup>), **<sup>y</sup>** <sup>∈</sup> Var<sup>k</sup>} be the set of negative points-to literals common to <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup>, involving left-hand side variables not allocated in either <sup>M</sup><sup>1</sup> or <sup>M</sup><sup>2</sup>.

**Lemma 3.** *Let* <sup>M</sup>1*,* <sup>M</sup><sup>2</sup> *be two footprint-consistent minterms that are and E-complete for* var(M<sup>1</sup> <sup>∪</sup> <sup>M</sup><sup>2</sup>)*, with* cc(M<sup>p</sup> <sup>1</sup> ) = cc(M<sup>p</sup> <sup>2</sup> )*. Then* <sup>M</sup><sup>1</sup> <sup>∗</sup> <sup>M</sup><sup>2</sup> <sup>≡</sup> elim∗(M<sup>1</sup>, M<sup>2</sup>)*, where*

$$\mathsf{elim}\_{\*}(M\_{1},M\_{2}) \stackrel{\text{def}}{=} M\_{1}^{e} \wedge M\_{2}^{e} \wedge \mathsf{dc}(M\_{1})^{u} \wedge \mathsf{dc}(M\_{2})^{u} \wedge \tag{2}$$

$$\bigwedge\_{\cdots \sim \mathfrak{a}^\*(M\_1)} \neg x \approx y \land \mathsf{fp}\_a(M\_1) \land \mathsf{fp}\_a(M\_2) \land \tag{3}$$

x∈av(M1), y∈av(M2) nalloc(nv(M<sup>1</sup>) <sup>∩</sup> nv(M<sup>2</sup>)) <sup>∧</sup> npto(M<sup>1</sup>, M<sup>2</sup>) <sup>∧</sup> (4)

$$|h| \ge \min\_{M\_1} + \min\_{M\_2} \quad \land \quad |h| < \max\_{M\_1} + \max\_{M\_2} - 1 \quad \text{(5)}$$

$$
\wedge \eta\_{12} \wedge \eta\_{21} \tag{6}
$$

$$\begin{array}{c} \mathsf{all} \ \mathsf{in}\_{ij} \stackrel{\mathsf{def}}{=} \bigwedge\_{Y \subseteq \mathsf{w}(M\_{j}) \backslash \mathsf{w}(M\_{i})} \mathsf{all} \mathsf{loc}(Y) \to \left( \begin{array}{c} |h| \ge \#\_{a}(M\_{i}) + |Y|\_{M\_{i}} + \min\_{M\_{j}} \\ \wedge \#\_{a}(M\_{i}) + |Y|\_{M\_{i}} < \max\_{M\_{i}} \end{array} \right). \end{array}$$

<sup>3</sup> Note that we do not assert the equality **<sup>y</sup>** <sup>≈</sup> **<sup>y</sup>** , instead we only check that it is not falsified. This is sufficient for our purpose because in the following we always assume that the considered minterms are E-complete.

<sup>4</sup> Footprint-consistency is a necessary, yet not sufficient, condition for satisfiability of minterms. For example, the minterm M = {x -→ y, x -→ y , ¬y ≈ y , |h| < 2} is at the same time footprint-consistent and unsatisfiable.

Intuitively, if <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> hold separately, then all heap-independent literals from <sup>M</sup><sup>1</sup> <sup>∪</sup> <sup>M</sup><sup>2</sup> must be satisfied (2), the variables allocated in <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> must be pairwise distinct and their footprints, relative to the allocated variables, jointly asserted (3). Moreover, unallocated variables on both sides must not be allocated and common negative points-to literals must be asserted (4). Since the heap satisfying elim∗(M<sup>1</sup>, M<sup>2</sup>) is the disjoint union of the heaps for <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup>, its bounds are the sum of the bounds on both sides (5) and, moreover, the variables that <sup>M</sup><sup>2</sup> never allocates [nv(M<sup>2</sup>)] may occur allocated in the heap of <sup>M</sup><sup>1</sup> and viceversa, thus the constraints <sup>η</sup><sup>12</sup> and <sup>η</sup>21, respectively (6).

Next, we show a similar result for the separating implication. For technical convenience, we translate the septraction <sup>M</sup><sup>1</sup> - <sup>M</sup><sup>2</sup>, instead of <sup>M</sup><sup>1</sup> −∗ <sup>M</sup><sup>2</sup>, as an equivalent boolean combination of test formulæ. This is without loss of generality, because <sup>M</sup><sup>1</sup> −∗ <sup>M</sup><sup>2</sup> ≡ ¬(M<sup>1</sup> - <sup>¬</sup>M<sup>2</sup>). Unlike with the case of the separating conjunction (Lemma 3), here the definition of the boolean combination of test formulæ depends on whether the universe is finite or infinite.

If the complement of some literal <sup>∈</sup> fpa(M<sup>1</sup>) belongs to <sup>M</sup><sup>2</sup> then no extension by a heap that satisfies may satisfy . Therefore, as an additional simplifying assumption, we suppose that fpa(M<sup>1</sup>) <sup>∩</sup> <sup>M</sup><sup>2</sup> <sup>=</sup> <sup>∅</sup>, so that <sup>M</sup><sup>1</sup> - <sup>M</sup><sup>2</sup> is not trivially unsatisfiable. We write φ <sup>≡</sup>*fin* <sup>ψ</sup> [<sup>φ</sup> <sup>≡</sup>*inf* <sup>ψ</sup>] if <sup>φ</sup> has the same truth value as ψ in all finite [infinite] structures.

**Lemma 4.** *Let* <sup>M</sup><sup>1</sup> *and* <sup>M</sup><sup>2</sup> *be footprint-consistent minterms that are Ecomplete for* var(M<sup>1</sup> <sup>∪</sup> <sup>M</sup><sup>2</sup>)*, such that:* <sup>M</sup><sup>1</sup> *is A-complete for* var(M<sup>1</sup> <sup>∪</sup> <sup>M</sup><sup>2</sup>)*,* Ma <sup>2</sup> <sup>∪</sup> <sup>M</sup><sup>p</sup> <sup>2</sup> <sup>⊆</sup> cc(M<sup>a</sup> <sup>1</sup> <sup>∪</sup> <sup>M</sup><sup>p</sup> <sup>1</sup> ) *and* fpa(M<sup>1</sup>) <sup>∩</sup> <sup>M</sup><sup>2</sup> <sup>=</sup> <sup>∅</sup>*.*

*Then,* <sup>M</sup><sup>1</sup> - <sup>M</sup><sup>2</sup> <sup>≡</sup>*fin* elim*fin* - (M1, M<sup>2</sup>) *and* <sup>M</sup><sup>1</sup> - <sup>M</sup><sup>2</sup> <sup>≡</sup>*inf* elim*inf* - (M1, M<sup>2</sup>)*, where:*

$$\mathsf{clim}^{\dagger}\_{-\diamond}(M\_1, M\_2) \stackrel{\mathsf{def}}{=} \mathsf{pc}(M\_1)^e \land M\_2^e \land \mathsf{dc}(M\_1)^u \land \mathsf{dc}(M\_2)^u \land \tag{7}$$

$$\mathop{\mathsf{nalloc}}\limits\_{\mathsf{n}\in\mathsf{I}}\limits\_{\mathsf{n}\in\mathsf{I}}\mathsf{a}\mathsf{v}(M\_{1})\,\wedge\,\mathsf{f}\mathsf{p}\_{\mathsf{nv}(M\_{1})}(M\_{2})\,\wedge\,\tag{8}$$

$$|h| \ge \min\_{M\_2} - \max\_{M\_1} + 1 \land |h| < \max\_{M\_2} - \min\_{M\_1} \quad (9)$$

$$
\wedge \lambda^{\dagger} \tag{10}
$$

*with*

$$\begin{array}{lcl} & & \mathsf{with} & &\\ \lambda^{\mathsf{fin}} & \triangleq & \bigwedge\_{Y \subseteq \mathsf{var}(M\_1 \cup M\_2)} \mathsf{nullloc}(Y) & \rightarrow & \left( \begin{array}{l} |h| < |U| - \min\_{M\_1} - \#\_n(Y, M\_1) + 1 \\ \wedge |U| \ge \min\_{M\_2} + \#\_n(Y, M\_1) \end{array} \right), \\ & \lambda^{\mathsf{inf}} \stackrel{\mathsf{def}}{=} \top. \end{array}$$

A heap satisfies <sup>M</sup><sup>1</sup> - <sup>M</sup><sup>2</sup> iff it has an extension, by a disjoint heap satisfying M<sup>1</sup>, that satisfies M<sup>2</sup>. Thus, elim† -(M<sup>1</sup>, M<sup>2</sup>) must entail the heap-independent literals of both <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> (7). Next, no variable allocated by <sup>M</sup><sup>1</sup> must be allocated by elim† -(M<sup>1</sup>, M<sup>2</sup>), otherwise no extension by a heap satisfying <sup>M</sup><sup>1</sup> is possible and, moreover, the footprint of <sup>M</sup><sup>2</sup> relative to the unallocated variables of <sup>M</sup><sup>1</sup> must be asserted (8). The heap's cardinality constraints depend on the bounds of <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> (9) and, if <sup>Y</sup> is a set of variables not allocated in the heap, these variables can be allocated in the extension (10). Actually, this is where the finite universe assumption first comes into play. If the universe is infinite, then there are enough locations outside the heap to be assigned to Y . However, if the universe is finite, then it is necessary to ensure that there are at least #n(Y,M<sup>1</sup>) free locations to be assigned to Y (10).

#### **4.2 Translating Quantifier-Free SL***<sup>k</sup>* **into Minterms**

We prove next that each quantifier-free SL<sup>k</sup> formula is equivalent to a finite disjunction of minterms:

**Lemma 5.** *Given a quantifier-free* SL<sup>k</sup> *formula* <sup>φ</sup>*, there exist two sets of minterms* μ*fin* (φ) *and* <sup>μ</sup>*inf*(φ) *such that the following equivalences hold: (1)* φ <sup>≡</sup>*fin* <sup>M</sup>∈μ*fin* (φ) <sup>M</sup>*, and (2)* <sup>φ</sup> <sup>≡</sup>*inf* <sup>M</sup>∈μ*inf* (φ) <sup>M</sup>*.*

The formal definition of <sup>μ</sup>*fin* (φ) and μ*inf*(φ) is given in [8] and omitted for the sake of conciseness and readability. Intuitively, these sets are defined by induction on the structure of the formula. For base cases, the following equivalences are used:

$$x \mapsto \mathbf{y} \equiv x \hookrightarrow \mathbf{y} \land |h| \approx 1 \qquad \mathsf{emp} \equiv |h| \approx 0 \qquad x \approx y \equiv x \approx y \land |h| \ge 0 \land |h| < \infty$$

For formulæ <sup>¬</sup>ψ<sup>1</sup> or <sup>ψ</sup><sup>1</sup> <sup>∧</sup>ψ<sup>2</sup>, the transformation is first applied recursively on <sup>ψ</sup><sup>1</sup> and <sup>ψ</sup><sup>2</sup>, then the obtained formula is transformed into dnf. For formulæ <sup>ψ</sup><sup>1</sup> <sup>∗</sup> <sup>ψ</sup><sup>2</sup> or <sup>ψ</sup><sup>1</sup> <sup>ψ</sup><sup>2</sup>, the transformation is applied on <sup>ψ</sup><sup>1</sup> and <sup>ψ</sup><sup>2</sup>, then the following equivalences are used to shift ∗ and innermost in the formula:

$$\begin{array}{ll} \left(\phi\_{1}\lor\phi\_{2}\right)\*\phi \equiv \left(\phi\_{1}\*\phi\right)\lor\left(\phi\_{2}\*\phi\right) & \left(\phi\_{1}\lor\phi\_{2}\right)\multimap\phi \equiv \left(\phi\_{1}\multimap\phi\right)\lor\left(\phi\_{2}\multimap\phi\right) \\ \phi\*\left(\phi\_{1}\lor\phi\_{2}\right) \equiv \left(\phi\*\phi\_{1}\right)\lor\left(\phi\*\phi\_{2}\right) & \phi\multimap\left(\phi\_{1}\lor\phi\_{2}\right) \equiv \left(\phi\multimap\phi\_{1}\right)\lor\left(\phi\multimap\phi\_{2}\right) \end{array}$$

Afterwards, the operands of ∗ and are minterms, and the result is obtained using the equivalences in Lemmas 3 and 4, respectively (up to a transformation into dnf). The only difficulty is that these lemmas impose some additional conditions on the minterms (e.g., being E-complete, or A-complete). However, the conditions are easy to enforce by case splitting, as illustrated by the following example:

*Example 1.* Consider the formula x → x y → y. It is easy to check that <sup>μ</sup>*†*(<sup>x</sup> → <sup>x</sup>) = {M<sup>1</sup>}, for †∈{*fin*, *inf* }, where <sup>M</sup><sup>1</sup> <sup>=</sup> x <sup>→</sup> <sup>x</sup>∧ |h| ≥ <sup>1</sup>∧ |h<sup>|</sup> <sup>&</sup>lt; 2 and <sup>μ</sup>*†*(<sup>y</sup> → <sup>y</sup>) = {M<sup>2</sup>}, where <sup>M</sup><sup>2</sup> <sup>=</sup> y <sup>→</sup> <sup>y</sup>∧|h| ≥ <sup>1</sup>∧|h<sup>|</sup> <sup>&</sup>lt; 2. To apply Lemma 4, we need to ensure that <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> are E-complete, which may be done by adding either <sup>x</sup> <sup>≈</sup> <sup>y</sup> or <sup>x</sup> ≈ <sup>y</sup> to each minterm. We also have to ensure that <sup>M</sup><sup>1</sup> is Acomplete, thus for z ∈ {x, y}, we add either alloc(z) or <sup>¬</sup>alloc(z) to M<sup>1</sup>. Finally, we must have M<sup>a</sup> <sup>2</sup> <sup>∪</sup> <sup>M</sup><sup>p</sup> <sup>2</sup> <sup>⊆</sup> cc(M<sup>a</sup> <sup>1</sup> <sup>∪</sup> <sup>M</sup><sup>p</sup> <sup>1</sup> ), thus we add either y <sup>→</sup> <sup>y</sup> or <sup>¬</sup>y <sup>→</sup> <sup>y</sup> to <sup>M</sup><sup>1</sup>. After removing redundancies, we get (among others) the minterms: M <sup>1</sup> <sup>=</sup> x <sup>→</sup> <sup>x</sup> ∧ |h| ≥ <sup>1</sup> ∧ |h<sup>|</sup> <sup>&</sup>lt; <sup>2</sup> <sup>∧</sup> <sup>x</sup> <sup>≈</sup> <sup>y</sup> and <sup>M</sup> <sup>2</sup> <sup>=</sup> y <sup>→</sup> <sup>y</sup> ∧ |h| ≥ <sup>1</sup> ∧ |h<sup>|</sup> <sup>&</sup>lt; <sup>2</sup>∧x <sup>≈</sup> y. Afterwards we compute elim*fin* -(M <sup>1</sup>, M <sup>2</sup>) = x <sup>≈</sup> y∧¬alloc(x)∧|h| ≥ <sup>0</sup><sup>∧</sup> <sup>|</sup>h<sup>|</sup> < 1.

As explained in Sect. 3.1, boolean combinations of minterms can only be transformed into sat-equivalent BSR(FO) formulæ if there is no positive occurrence of test formulæ <sup>|</sup>h|≥|U|−n or alloc(x) (see the conditions in Lemmas <sup>1</sup> (2) and 2). Consequently, we relate the polarity of these formulæ in some minterm M <sup>∈</sup> μ*fin* (φ)∪μ*inf*(φ) with that of a separating implication within <sup>φ</sup>. The analysis depends on whether the universe is finite or infinite.

**Lemma 6.** *For any quantifier-free* SL<sup>k</sup> *formula* <sup>φ</sup>*, the following properties hold:*


Given a quantifier-free SL<sup>k</sup> formula φ, the number of minterms occurring in <sup>μ</sup>*fin* (φ) [μ*inf*(φ)] is exponential in the size of φ, in the worst case. Therefore, an optimal decision procedure cannot generate and store these sets explicitly, but rather must enumerate minterms lazily. We show that (i) the size of the minterms in <sup>μ</sup>*fin* (φ) <sup>∪</sup> μ*inf*(φ) is bounded by a polynomial in the size of φ, and that (ii) the problem "*given a minterm M, does M occur in* <sup>μ</sup>*fin* (φ) *[resp. in* μ*inf*(φ)*]?*" is in PSPACE. To this aim, we define a measure on a quantifier-free formula φ, which bounds the size of the minterms in the sets <sup>μ</sup>*fin* (φ) and μ*inf*(φ), inductively on the structure of the formulæ:

$$\begin{array}{l} \mathcal{M}(x \approx y) \stackrel{\text{def}}{=} 0 & \mathcal{M}(\perp) \stackrel{\text{def}}{=} 0\\ \mathcal{M}(\textsf{emp}) \stackrel{\text{def}}{=} 1 & \mathcal{M}(x \mapsto \textbf{y}) \stackrel{\text{def}}{=} 2\\ \mathcal{M}(\neg \phi\_{1}) \stackrel{\text{def}}{=} \mathcal{M}(\phi\_{1}) & \mathcal{M}(\phi\_{1} \land \phi\_{2}) \stackrel{\text{def}}{=} \max(\mathcal{M}(\phi\_{1}), \mathcal{M}(\phi\_{2}))\\ \mathcal{M}(\phi\_{1} \star \phi\_{2}) \stackrel{\text{def}}{=} \sum\_{i=1}^{2} (\mathcal{M}(\phi\_{i}) + ||\textsf{var}(\phi\_{i})||) & \mathcal{M}(\phi\_{1} \star \phi\_{2}) \stackrel{\text{def}}{=} \sum\_{i=1}^{2} (\mathcal{M}(\phi\_{i}) + ||\textsf{var}(\phi\_{i})||) \end{array}$$

**Definition 6.** *A minterm* M *is* <sup>M</sup>-bounded *by a formula* φ*, if for each literal* <sup>∈</sup> <sup>M</sup>*, the following hold: (i)* <sup>M</sup>( ) ≤ M(φ) *if* ∈ {|h| ≥ min<sup>M</sup>*<sup>i</sup>* , <sup>|</sup>h<sup>|</sup> <sup>&</sup>lt; max<sup>M</sup>*<sup>i</sup>* } *(ii)* <sup>M</sup>( ) <sup>≤</sup> <sup>2</sup>M(φ)+1*, if* ∈ {|U| ≥ n, <sup>|</sup>U<sup>|</sup> < n <sup>|</sup> n <sup>∈</sup> <sup>N</sup>}*.*

The following lemma provides the desired result:

**Lemma 7.** *Given a quantifier-free* SL<sup>k</sup> *formula* φ*, each minterm* M <sup>∈</sup> μ*fin* (φ) <sup>∪</sup> μ*inf*(φ) *is* <sup>M</sup>*-bounded by* φ*.*

The proof goes by a careful analysis of the test formulæ introduced in Lemmas 3 and <sup>4</sup> or created by minterm transformations (see [8] for details). Since <sup>M</sup>(φ) is polynomially bounded by size(φ), this entails that it is possible to check whether M <sup>∈</sup> μ*fin* (φ) [resp. <sup>μ</sup>*inf*(φ)] using space bounded also by a polynomial in size(φ).

**Lemma 8.** *Given a minterm* <sup>M</sup> *and an* SL<sup>k</sup> *formula* φ*, the problems of checking whether* M <sup>∈</sup> μ*fin* (φ) *and* <sup>M</sup> <sup>∈</sup> <sup>μ</sup>*inf*(φ) *are in PSPACE.*

*Remark 1.* Observe that the formulæ elim∗(M<sup>1</sup>, M<sup>2</sup>) and elim*fin* - (M<sup>1</sup>, M<sup>2</sup>) in Lemmas <sup>3</sup> and <sup>4</sup> are of exponential size, because Y ranges over sets of variables. However these formulæ do not need to be constructed explicitly. To check that M <sup>∈</sup> μ*fin* (φ) or <sup>M</sup> <sup>∈</sup> <sup>μ</sup>*inf*(φ), we only have to guess such sets <sup>Y</sup> . See [8] for details.

# **5 Bernays-Sch¨onfinkel-Ramsey SL***<sup>k</sup>*

This section gives the results concerning decidability of the (in)finite satisfiability problems within the BSR(SL<sup>k</sup>) fragment. BSR(SL<sup>k</sup>) is the set of sentences <sup>∀</sup>y<sup>1</sup> ... <sup>∀</sup>y<sup>m</sup> . φ, where <sup>φ</sup> is a quantifier-free SL<sup>k</sup> formula, with var(φ) = {x<sup>1</sup> ,...,x<sup>n</sup> , y<sup>1</sup> ,...,y<sup>m</sup>}, where the existentially quantified variables <sup>x</sup>1,...,x<sup>n</sup> are left free. First, we show that, contrary to BSR(FO), the satisfiability of BSR(SL<sup>k</sup>) is undecidable for k <sup>≥</sup> 2. Second, we carve two nontrivial fragments of BSR(SL<sup>k</sup>), for which the infinite and finite satisfiability problems are both PSPACE-complete. These fragments are defined based on restrictions of (i) polarities of the occurrences of the separating implication, and (ii) occurrences of universally quantified variables in the scope of separating implications. These results draw a rather precise chart of decidability within the BSR(SL<sup>k</sup>) fragment. For k = 1, the satisfiability problem of BSR(SL<sup>1</sup>) is in PSPACE [7] (it is undecidable for arbitrary SL<sup>1</sup> formulæ [2] and decidable but nonelementary for *prenex* formulæ [7]).

#### **5.1 Undecidability of BSR(SL***<sup>k</sup>* **)**

**Theorem 1.** *The finite and infinite satisfiability problems are both undecidable for* BSR(SL<sup>k</sup>)*.*

We provide a brief sketch of the proof, see [8] for details. We consider the finite satisfiability problem of the [∀,(0),(2)]<sup>=</sup> fragment of FO, which consists of sentences of the form <sup>∀</sup>x.φ(x), where φ is a quantifier-free boolean combination of atomic propositions <sup>t</sup><sup>1</sup> <sup>≈</sup> <sup>t</sup><sup>2</sup>, and <sup>t</sup><sup>1</sup>, t<sup>2</sup> are terms built using two function symbols f and g, of arity one, the variable x and constant c. It is known (see e.g. [1, Theorem 4.1.8]) that finite satisfiability is undecidable for [∀,(0),(2)]=. We reduce this problem to BSR(SL<sup>k</sup>) satisfiability. The idea is to encode the value of f and g into the heap, in such a way that every element x points to (f(x), g(x)). Given a sentence <sup>ϕ</sup> <sup>=</sup> <sup>∀</sup>x.φ(x) in [∀,(0),(2)]=, we proceed by first *flattening* each term in φ consisting of nested applications of f and g. The result is an equivalent sentence <sup>ϕ</sup>*flat* <sup>=</sup> <sup>∀</sup>x<sup>1</sup> ... <sup>∀</sup>x<sup>n</sup> . φ*flat*, in which the only terms are <sup>x</sup><sup>i</sup>, c, f(x<sup>i</sup>), g(x<sup>i</sup>), f(c) and g(c), for i <sup>∈</sup> [1, n]. For example, the formula <sup>∀</sup>x.f(g(x)) <sup>≈</sup> <sup>c</sup> is flattened into <sup>∀</sup>x<sup>1</sup>∀x<sup>2</sup> . g(x<sup>1</sup>) ≈ <sup>x</sup><sup>2</sup> <sup>∨</sup>f(x<sup>2</sup>) <sup>≈</sup> <sup>c</sup>. We define the following BSR(SL<sup>2</sup>) sentences ϕ† sl, for †∈{*fin*, *inf* }:

$$\alpha \uparrow \wedge x\_c \hookrightarrow (y\_c, z\_c) \wedge \forall x\_1 \dots \forall x\_n \forall y\_1 \dots \forall y\_n \forall z\_1 \dots \forall z\_n \ . \ \bigwedge\_{i=1}^n (x\_i \hookrightarrow (y\_i, z\_i) \to \phi\_{\mathfrak{sl}}) \tag{11}$$

with α*fin* def <sup>=</sup> <sup>∀</sup>x . alloc(x) or α*fin* def <sup>=</sup> <sup>|</sup>h|≥|U| − 0, α*inf* def <sup>=</sup> <sup>∀</sup>x∀y∀z.x<sup>→</sup> (y, z) <sup>→</sup> alloc(y) <sup>∧</sup> alloc(z) and <sup>φ</sup>sl is obtained from <sup>φ</sup>*flat* by replacing each occurrence of <sup>c</sup> by <sup>x</sup>c, each term <sup>f</sup>(c) [g(c)] by <sup>y</sup><sup>c</sup> [zc] and each term <sup>f</sup>(xi) [g(xi)] by <sup>y</sup><sup>i</sup> [zi]. Intuitively, α*fin* asserts that the heap is a total function, and α*inf* states that every referenced cell is allocated<sup>5</sup>. It is easy to check that <sup>ϕ</sup> and <sup>ϕ</sup>sl are equisatisfiable. The undecidability result still holds for finite satisfiability if a single occurrence of −∗ is allowed, in a (ground) formula <sup>|</sup>h|≥|U| − 0 (see the definition of α*fin* above).

#### **5.2 Two Decidable Fragments of BSR(SL***<sup>k</sup>* **)**

The reductions (11) use either positive occurences of alloc(x), where x is universally quantified, or test formulæ <sup>|</sup>h|≥|U| − n. We obtain decidable subsets of BSR(SL<sup>k</sup>) by eliminating the positive occurrences of both (i) alloc(x), with <sup>x</sup> universally quantified, and (ii) <sup>|</sup>h|≥|U|−n, from μ*†*(φ), where †∈{*fin*, *inf* } and <sup>∀</sup>y<sup>1</sup> ... <sup>∀</sup>y<sup>m</sup> . φ is any BSR(SL<sup>k</sup>) formula. Note that <sup>μ</sup>*inf*(φ) contains no formulæ of the form <sup>|</sup>h|≥|U| − n, which explains why slightly less restrictive conditions are needed for infinite structures.

**Definition 7.** *Given an integer* k <sup>≥</sup> <sup>1</sup>*, we define:*


Note that BSR*fin*(SL<sup>k</sup>) - BSR*inf* (SL<sup>k</sup>) -BSR(SL<sup>k</sup>), for any k <sup>≥</sup> 1.

*Remark 2.* Because the polarity of the antecedent of a −∗ is neutral, Definition 7 imposes no constraint on the occurrences of separating implications at the *left* of a −∗<sup>6</sup>.

The decidability result of this paper is stated below:

**Theorem 2.** *For any integer* k <sup>≥</sup> <sup>1</sup> *not depending on the input, the infinite satisfiability problem for* BSR*inf* (SL<sup>k</sup>) *and the finite satisfiability problem for* BSR*fin*(SL<sup>k</sup>) *are both PSPACE-complete.*

We provide a brief sketch of the proof (all details are available in [8]). In both cases, PSPACE-hardness is an immediate consequence of the fact that the quantifier-free fragment of SL<sup>k</sup>, without the separating implication, but with the separating conjunction and negation, is PSPACE-hard [4]. For PSPACE-membership, consider a formula ϕ in BSR*inf* (SL<sup>k</sup>), and its equivalent disjunction of minterms ϕ (of exponential size). Lemma <sup>8</sup> gives us an upper bound on the size of test

<sup>5</sup> Note that the two definitions of α*fin* are equivalent. The formula α*fin* is unsatisfiable on infinite universes, which explains why the definitions of α*fin* and α*inf* differ.

<sup>6</sup> The idea is that if a formula alloc(x) or <sup>|</sup>h|≥|U| − <sup>n</sup> occurs in the antecedent of a −∗, then it will be eliminated by the transformation in Lemma 4. In contrast, such test formulæ will not be eliminated if they occur in the subsequent of a −∗.

formulæ in ϕ , hence on the number of constant symbols occurring in <sup>τ</sup>(ϕ ). This, in turns, gives a bound on the cardinality of the model of <sup>τ</sup>(ϕ ). We may thus guess such an interpretation, and check that it is indeed a model of <sup>τ</sup>(ϕ ) by enumerating all the minterms in ϕ (this is feasible in polynomial space thanks to Lemma 8) and translating them on-the-fly into first-order formulæ. The only subtle point is that the model obtained in this way is finite, whereas our aim is to test that the obtained formula has a *infinite* model. This difficulty can be overcome by adding an axiom ensuring that the domain contains more *unallocated* elements than the total number of constant symbols and variables in the formula. This is sufficient to prove that the obtained model – although finite – can be extended into an infinite model, obtained by creating infinitely many copies of these elements.

The proof for BSR*fin*(SL<sup>k</sup>) is similar, but far more involved. The problem is that, if the universe is finite, then alloc(x) test formulæ may occur at a positive polarity, even if every <sup>φ</sup><sup>1</sup> −∗ <sup>φ</sup><sup>2</sup> subformula occurs at a negative polarity, due to the positive occurrences of alloc(x) within λ*fin* (10) in the definition of elim*fin* -(M1, M<sup>2</sup>). As previously discussed, positive occurrences of alloc(x) hinder the translation into BSR(FO), because of the existential quantifiers that may occur in the scope of a universal quantifier. The solution is to distinguish a class of finite structures (U, <sup>s</sup>, <sup>h</sup>), the so-called α*-controlled structures*, for some α <sup>∈</sup> <sup>N</sup>, for which there are locations <sup>1</sup>,..., <sup>α</sup>, such that every location <sup>∈</sup> <sup>U</sup> is either <sup>i</sup> or points to a tuple from the set { <sup>1</sup>,..., <sup>α</sup>, }. For such structures, the formulæ alloc(x) can be eliminated in a straightforward way because they are equivalent to <sup>α</sup> <sup>i</sup>=1(<sup>x</sup> <sup>≈</sup> <sup>i</sup> <sup>→</sup> alloc( <sup>i</sup>)). If the structure is not <sup>α</sup>-controlled, then we can show that there exist sufficiently many unallocated cells, so that all the cardinality constraints of the form <sup>|</sup>h|≤|U| − n or <sup>|</sup>U| ≥ n are always satisfied. This ensures that the truth value of the positive occurrences of alloc(x) are irrelevant, because they only occur in formulæ λ*fin* that are always true if all test formulæ <sup>|</sup>h|≤|U| − n or <sup>|</sup>U| ≥ n are true (see the definition of λ*fin* in Lemma 4).

#### **6 Conclusions and Future Work**

We have studied the decidability problem for SL formulæ with quantifier prefix in the language <sup>∃</sup><sup>∗</sup>∀<sup>∗</sup>, denoted as BSR(SL<sup>k</sup>). Although the fragment was found to be undecidable, we identified two non-trivial subfragments for which the infinite and finite satisfiability are PSPACE-complete. These fragments are defined by restricting the use of universally quantified variables within the scope of separating implications at positive polarity. The universal quantifiers and separating conjunctions are useful to express local constraints on the shape of the datastructure, whereas the separating implications allow one to express dynamic transformations of these data-structures. As a consequence, separating implications usually occur negatively in the formulæ tested for satisfiability, and the decidable classes found in this work are of great practical interest. Future work involves formalizing and implementing an invariant checking algorithm based on the above ideas, and using the techniques for proving decidability (namely the translation of quantifier-free SL(k) formulæ into boolean combinations of test formulæ) to solve other logical problems, such as frame inference, abduction and possibly interpolation.

**Acknowledgments.** The authors wish to acknowledge the contributions of St´ephane Demri and Etienne Lozes to the insightful discussions during the early stages of this ´ work.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Continuous Reachability for Unordered Data Petri Nets is in PTime**

Utkarsh Gupta<sup>1</sup>, Preey Shah<sup>1</sup>, S. Akshay1(B), and Piotr Hofman<sup>2</sup>

<sup>1</sup> Department of CSE, IIT Bombay, Mumbai, India akshayss@cse.iitb.ac.in <sup>2</sup> University of Warsaw, Warsaw, Poland

**Abstract.** Unordered data Petri nets (UDPN) are an extension of classical Petri nets with tokens that carry data from an infinite domain and where transitions may check equality and disequality of tokens. UDPN are well-structured, so the coverability and termination problems are decidable, but with higher complexity than for Petri nets. On the other hand, the problem of reachability for UDPN is surprisingly complex, and its decidability status remains open. In this paper, we consider the continuous reachability problem for UDPN, which can be seen as an over-approximation of the reachability problem. Our main result is a characterization of continuous reachability for UDPN and polynomial time algorithm for solving it. This is a consequence of a combinatorial argument, which shows that if continuous reachability holds then there exists a run using only polynomially many data values.

**Keywords:** Petri nets · Continuous reachability · Unordered data · Polynomial time

# **1 Introduction**

The theory of Petri nets has been developing since more than 50 years. On one hand, from a theory perspective, Petri nets are interesting due to their deep mathematical structure and despite exhibiting nice properties, like being a well structured transition system [1], we still don't understand them well. On the other hand, Petri nets are a useful pictorial formalism for modeling and thus found their way to the industry. To connect this theory and practice, it would be desirable to use the developed theory of Petri nets [2–4] for the symbolic analysis and verification of Petri nets models. However, we already know that this is difficult in its full generality. It suffices to recall two results that were proved more than 30 years apart. An old but classical result by Lipton [5] shows that even coverability is ExpSpacehard, while the non-elementary hardness of the reachability relation has just been

Supported by Polish NCN grant UMO-2016/21/D/ST6/01368, DST Inspire faculty award IFA12-MA-17, and DST/CEFIPRA project EQuaVe.

U. Gupta and P. Shah—Contributed equally to this work.

c The Author(s) 2019

M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 260–276, 2019. https://doi.org/10.1007/978-3-030-17127-8\_15

established this year [6]. Moreover, when we look at Petri nets based formalisms that are needed to model various aspects of industrial systems, we see that they go beyond the expressivity of Petri nets. For instance, colored Petri nets, which are used in modeling workflows [7], allow the tokens to be colored with an infinite set of colors, and introduce a complex formalism to describe dependencies between colors. This makes all verification problems undecidable for this generic model. Given the basic nature and importance of the reachability problem in Petri nets (and its extensions), there have been several efforts to sidestep the complexitytheoretic hardness results. One common approach is to look for easy subclasses (such as bounded nets [8], free-choice nets [9] etc.). The other approach, which we adopt in this work, is to compute over-approximations of the reachability relation.

*Continuous Reachability.* A natural question regarding the dynamics of a Petri net is to ask what would happen if tokens instead of behaving like discrete units start to behave like a continuous fluid? This simple question led to an elegant theory of so-called continuous Petri nets [10–12]. Petri nets with continuous semantics allow markings to be functions from places to *nonnegative rational numbers* (i.e., in Q<sup>+</sup>) instead of natural numbers. Moreover, whenever a transition is fired a positive rational coefficient is chosen and both the number of consumed and produced tokens are multiplied with the coefficient. This allows to split tokens into arbitrarily small parts and process them independently. This may occur, e.g., in applications related to hybrid systems where the discrete part is used to control the continuous system [13,14]. Interestingly, this makes things simpler to analyze. For example reachability under the continuous semantics for Petri nets is P T ime-complete [11]. However, when one wants to analyze extensions of Petri nets, e.g., reset Petri nets with continuous semantics, it turns out that reachability is as hard as reachability in reset Petri nets under the usual semantics i.e. it is undecidable<sup>1</sup>. In this paper we identify an extension of Petri nets with unordered data, for which this is not the case and continuous semantics leads to a substantial reduction in the complexity of the reachability problem.

*Unordered Data Petri Nets.* The possibility of equipping tokens with some additional information is one of the main lines of research regarding extensions of Petri nets, the best known being Colored Petri nets [15] and various types of timed Petri nets [16,17]. In [18] authors equipped tokens with data and restricted interactions between data in a way that allow to transfer techniques for well structured transition systems. They identified various classes of nets exhibiting interesting combinatorial properties which led to a number of results [19–23]. Unordered Data Petri Nets (UDPN), are simplest among them: every token carries a single datum like a barcode and transitions may check equality or disequality of data in consumed and produced tokens. UDPN are the only class identified in [18] for which the reachability is still unsolved, although in [20] authors show that the problem is at least Ackermannian-hard (for all other data extensions, reachability is undecidable). A recent attempt to over-approximate the reachability relation for UDPN in [22]

<sup>1</sup> This can be seen on the same lines as the proof of undecidability of continuous reachability for Petri nets with zero tests [12].

considers integer reachability i.e. number of tokens may get negative during the run (also called solution of the state equation). From the above perspective, this paper is an extension of the mentioned line of research.

*Our Contribution.* Our main contribution is a characterization of continuous reachability in UDPN and a polynomial time algorithm for solving it. Observe that if we find an upper bound on the minimal number of data required by a run between two configurations (if any run exists), then we can reduce continuous reachability in UDPN to continuous reachability in vanilla Petri nets with an exponential blowup and use the already developed characterization from [11]. In Sect. 5 we prove such a bound on the minimal number of required data. The bound is novel and exploits techniques that did not appear previously in the context of data nets. Further, the obtained bounds are lower than bounds on the number of data values required to solve the state equation [22], which is surprising considering that existence of a continuous run requires a solution of a sort of state equation. Precisely, the difference is that we are looking for solutions of the state equation over Q<sup>+</sup> instead of N and in this case we prove better bounds for the number of data required. This also gives us an easy polytime algorithm for finding Q<sup>+</sup>-solutions of state equations of UDPN (we remark that for Petri nets without data, this appears among standard algebraic techniques [24]).

Finally, with the above bound, we solve continuous reachability in UDPN by adapting the techniques from the non-data setting of [12,25]. We adapt the characterization of continuous reachability to the data setting and next encode it as system of linear equations with implications. In doing so, however, we face the problem that a naive encoding (representing data explicitly) gives a system of equations of exponential size, giving only an ExpTime-algorithm. To improve the complexity, we use histograms, a combinatorial tool developed in [22], to compress the description of solutions of state equations in UDPNs. However, this may lead to spurious solutions for continuous reachability. To eliminate them, we show that it suffices to first transform the net and then apply the idea of histograms to characterize continuous runs in the modified net. The whole procedure is described in Sect. 7.3 and leads us to our P T ime algorithm for continuous reachability in UDPN. Note that since we easily have P T ime hardness for the problem (even without data), we obtain that the problem of continuous reachability in UDPN is P T ime-complete.

*Towards Verification.* Over-approximations are useful in verification of Petri nets and their extensions: as explained in [24], for many practical problems, over-approximate solutions are already correct. Further, we can use them as a sub-routine to improve the practical performance of verification algorithms. A remarkable example is the recent work in [25], where the P T ime continuous reachability algorithm for Petri nets from [11] is used as a subroutine to solve the ExpSpace hard coverability problem in Petri nets, outperforming the best known tools for this problem, such as Petrinizer [26]. Our results can be seen as a first step in the same spirit towards handling practical instances of coverability, but for the extended model of UDPN, where the coverability problem for UDPN is known to be Ackermannian-hard [20].

Omitted proofs and details can be found in the extended version at [27].

#### **2 Preliminaries**

We denote integers, non-negative integers, rationals, and reals as Z, N, Q, and <sup>R</sup>, respectively. For a set <sup>X</sup> <sup>⊆</sup> <sup>R</sup> denote by <sup>X</sup><sup>+</sup>, the set of all non-negative elements of X. We denote by **0**, a vector whose entries are all zero. We define in a standard point-wise way operations on vectors i.e. *scalar multiplication* ·*, addition* +*, subtraction* −*, and vector comparison* ≤. In this paper, we use functions of the type X → (Y → Z), and instead of (f(x))(y), we write f(y, x). For functions f,g where the range of g is a subset of the domain of f, we denote their composition by <sup>f</sup> ◦ <sup>g</sup>. If <sup>π</sup> is an injection then by <sup>π</sup>−<sup>1</sup> we mean a partial function such that <sup>π</sup>−<sup>1</sup> ◦ <sup>π</sup> is the identity function. Let <sup>f</sup> : <sup>X</sup><sup>1</sup> <sup>→</sup> <sup>Y</sup> , <sup>g</sup> : <sup>X</sup><sup>2</sup> <sup>→</sup> <sup>Y</sup> be two functions with addition and scalar multiplication operations defined on Y. A *scalar multiplication* of a function is defined as follows (a·f)(x) = a·f(x) for all x ∈ X1. We lift *addition* operation to functions pointwise, i.e. f+g : X1∪X<sup>2</sup> → Y such that

$$(f+g)(x) = \begin{cases} f(x) & \text{if } x \in X\_1 \backslash X\_2 \\ g(x) & \text{if } x \in X\_2 \backslash X\_1 \\ f(x) + g(x) & \text{if } x \in X\_1 \cap X\_2 \end{cases}$$

Similarly for *subtraction* (f − g)(x) = f(x) + −1 · g(x), and f ≤ g if for all x ∈ X<sup>1</sup> ∪ X2,(g − f)(x) ≤ 0.

We use *matrices* with *rows and columns* indexed by sets S1, S2, possibly infinite. For a matrix M, let M(r, c) denote the entry at column c and row r, and M(r, •), M(•, c) denote the row vector indexed by r and column vector indexed by c, respectively. Denote by *col*(M), *row*(M) the set of indices of nonzero columns and nonzero rows of the matrix M, respectively. Even if we have infinitely many rows or columns, our matrices will have only finitely many *nonzero* rows and columns, and only this nonzero part will be represented. Following our nonstandard matrix definition we precisely define operations on them, although they are natural. First, a *multiplication by a constant number* produces a new matrix with row and columns labelled with the same sets <sup>S</sup>1, <sup>S</sup><sup>2</sup> and defined as follows (<sup>a</sup> · <sup>M</sup>)(r, c) = <sup>a</sup> · (M(r, c)) for all (r, c) <sup>∈</sup> <sup>S</sup><sup>1</sup> <sup>×</sup> <sup>S</sup>2. *Addition* of two matrices is only defined if the sets indexing rows S<sup>1</sup> and columns S<sup>2</sup> are the same for both summands M<sup>1</sup> and M2, <sup>∀</sup>(r, c) <sup>∈</sup> <sup>S</sup><sup>1</sup> <sup>×</sup> <sup>S</sup><sup>2</sup> the sum (M<sup>1</sup> <sup>+</sup> <sup>M</sup>2)(r, c) = <sup>M</sup>1(r, c) + <sup>M</sup>2(r, c), the *subtraction* M<sup>1</sup> − M<sup>2</sup> is a shorthand for M<sup>1</sup> + (−1) · M2. Observe that all but finitely many entries in matrices are 0, and therefore when we do computation on matrices we can restrict to rows *row*(M1) ∪ *row*(M2) and columns *col*(M1) ∪ *col*(M2). Similarly the *comparison* for two matrices M1, M<sup>2</sup> is defined as follows M<sup>1</sup> ≤ M<sup>2</sup> if ∀(r, c) ∈ (*row*(M1) ∪ *row*(M2)) × (*col*(M1) ∪ *col*(M2)) M1(r, c) ≤ M2(r, c); relations >, ≥, ≤ are defined analogically. The last operation which we need is matrix multiplication M<sup>1</sup> · M<sup>2</sup> = M3, it is only allowed if the set of columns of the first matrix M<sup>1</sup> is the same as the set of rows of the second matrix M2, the sets of rows and columns of the resulting matrix M<sup>3</sup> are rows of the matrix M<sup>1</sup> and columns of M2, respectively. M3(r, c) = <sup>k</sup> M1(r, k)M2(k, c) where k runs through columns of M1. Again, observe that if the row or a column is equal to 0 for all entries then the effect of multiplication is 0, thus we may restrict to *row*(M1) and *col*(M2). Moreover in the sum it suffices to write <sup>k</sup>∈*col*(M1) <sup>M</sup>1(r, k)M2(k, c).

# **3 UDPN, Reachability and Its Variants: Our Main Results**

Unordered data Petri nets extend the classical model of Petri nets by allowing each token to hold a data value from a countably-infinite domain D. Our definition is closest to the definition of ν-Petri nets from [28]. For simplicity we choose this one instead of using the equivalent but complex one from [18].

**Definition 1.** *Let* D *be a countably infinite set. An unordered data Petri net (UDPN) over domain* D *is a tuple* (P, T, F, *Var* ) *where* P *is a finite set of places,* T *is a finite set of transitions, Var is a finite set of variables, and* F : (P × T)∪ (<sup>T</sup> <sup>×</sup> <sup>P</sup>) <sup>→</sup> (*Var* <sup>→</sup> <sup>N</sup>) *is a flow function that assigns each place* <sup>p</sup> <sup>∈</sup> <sup>P</sup> *and transition* t ∈ T *a function over variables in Var.*

For each transition t ∈ T we define functions F(•, t) and F(t, •), *Var* → (<sup>P</sup> <sup>→</sup> <sup>N</sup>) as <sup>F</sup>(•, t)(p, x) = <sup>F</sup>(p, t)(x) and analogously <sup>F</sup>(t, •)(p, x) = <sup>F</sup>(t, p)(x). *Displacement* of the transition <sup>t</sup> is a function <sup>Δ</sup>(t) : *Var* <sup>→</sup> (<sup>P</sup> <sup>→</sup> <sup>Z</sup>) defined as Δ(t) def = F(t, •) − F(•, t).

For <sup>X</sup> ∈ {N,Z, <sup>Q</sup>, <sup>Q</sup><sup>+</sup>}, we define an <sup>X</sup>*-marking* as a function <sup>M</sup> : <sup>D</sup> <sup>→</sup> (<sup>P</sup> <sup>→</sup> X) that is constant 0 on all except finitely many values of D. Intuitively, M(p, α) denotes the number of *tokens* with the data value α at place p. The fact that it is 0 at all but finitely many data means that the number of tokens in any <sup>X</sup>-marking is finite. We denote the infinite set of all <sup>X</sup>-markings by <sup>M</sup><sup>X</sup>.

We define an <sup>X</sup>*-step* as a triple (c, t, π) for a transition <sup>t</sup> <sup>∈</sup> <sup>T</sup>, *mode* <sup>π</sup> being an injective map <sup>π</sup> : *Var* <sup>→</sup> <sup>D</sup>, and a scalar constant <sup>c</sup> <sup>∈</sup> <sup>X</sup><sup>+</sup>. An <sup>X</sup>-step (c, t, π) is *fireable* at a <sup>X</sup>-marking *<sup>i</sup>* if *<sup>i</sup>* <sup>−</sup> <sup>c</sup> · <sup>F</sup>(•, t) ◦ <sup>π</sup>−<sup>1</sup> ∈ M<sup>X</sup>.

The X-marking *f* reached after *firing* an X-step (c, t, π) at *i* is given as *<sup>f</sup>* <sup>=</sup> *<sup>i</sup>* <sup>+</sup> <sup>c</sup> · <sup>Δ</sup>(t) ◦ <sup>π</sup>−<sup>1</sup>. We also say that an <sup>X</sup>-step (c, t, π) when fired *consumes* tokens <sup>c</sup>·F(•, t)◦π−<sup>1</sup> and *produces* tokens <sup>c</sup>·F(t, •)◦π−<sup>1</sup>. We define an <sup>X</sup>-run as a sequence of <sup>X</sup>-steps and we can represent it as {(ci, ti, πi)}|ρ<sup>|</sup> where (ci, ti, πi) is the i th <sup>X</sup>-step and <sup>|</sup>ρ<sup>|</sup> is the number of <sup>X</sup>-steps. A run <sup>ρ</sup> <sup>=</sup> {(ci, ti, πi)}|ρ<sup>|</sup> is fireable at a <sup>X</sup>-marking *<sup>i</sup>* if, <sup>∀</sup><sup>1</sup> <sup>≤</sup> <sup>i</sup> ≤ |ρ|, the step (ci, ti, πi) is fireable at *i* + <sup>i</sup>−<sup>1</sup> <sup>j</sup>=1 <sup>c</sup>iΔ(t<sup>j</sup> ) ◦ <sup>π</sup>−<sup>1</sup> <sup>j</sup> . By *<sup>i</sup>* <sup>ρ</sup> −→<sup>X</sup> *f* we denote that ρ is fireable at *i* and after firing ρ at *i* we reach X-marking *f* = *i* + <sup>|</sup>ρ<sup>|</sup> <sup>i</sup>=1 <sup>c</sup><sup>i</sup> · <sup>Δ</sup>(ti) ◦ <sup>π</sup>−<sup>1</sup> <sup>i</sup> . We call (the function computed by) the mentioned sum <sup>|</sup>ρ<sup>|</sup> <sup>i</sup>=1 <sup>c</sup>iΔ(ti) ◦ <sup>π</sup>−<sup>1</sup> <sup>i</sup> as the *effect* of the run and denote it by Δ(ρ).

We fix some notations for the rest of the paper. We use Greek letters α, β, γ to denote data values from data domain D, ρ, σ to denote a run, π to denote a mode and x, y, z to denote the variables. When clear from the context, we may omit X from X-marking, X-run and just write marking, run, etc. Further, we will use letters in bold, e.g., *m* to denote markings, where *i*, *f* will be used for initial and final markings respectively. Further, throughout the paper, unless stated explicitly otherwise, we will refer to a UDPN N = (P, T, F, *Var* ), therefore P, T, F, *Var* will denote the places, transitions, flow, and variables of this UDPN.

*Example 1.* An example of a simple UDPN <sup>N</sup><sup>1</sup> is given in Fig. 1. For this example, we have *P* = {*p*1*, p*2*, p*3*, p*4}, *T* = {*t*}, *V ar* = {*x, y, z*}, and the flow relation is given by *F*(*p*1*, t*) = {*y* -→ 1}, *F*(*p*2*, t*) = {*x* -→ 1}, *F*(*t, p*3) = {*y* -→ 2}, *F*(*t, p*4) = {*x* -→ 1*, z* -→ 1}, and an assignment of 0 to every variable for the remaining of the pairs. Thus, for enabling transition *p*<sup>1</sup> and *p*<sup>2</sup> must have one token each with a different data value (since *x* = *y*) and after firing two tokens are produced in *p*<sup>3</sup> with same data value as was consumed from *p*<sup>1</sup> and two tokens are produced in *p*4, one of whom has same data as consumed from *p*2.

**Fig. 1.** A simple UDPN N<sup>1</sup>

**Definition 2.** *Given* X*-markings i*, *f, we say f is* X*-reachable from i if there exists an* X*-run* ρ *s.t., i* <sup>ρ</sup> −→<sup>X</sup> *f.*

When X = N, X-reachability is the classical reachability problem, whose decidability is still unknown, while Z-reachability for UDPN is in NP [22].

In this paper we tackle Q and Q<sup>+</sup>-reachability, also called *continuous* reachability in UDPN.

The first step towards the solution is showing that if a Q<sup>+</sup>-marking *f* is Q<sup>+</sup>-reachable from a Q<sup>+</sup>-marking *i*, then there exists a Q<sup>+</sup>-run ρ which uses polynomially many data values and *i* <sup>ρ</sup> −→<sup>Q</sup><sup>+</sup> *f* . We first formalize the set of distinct data values associated with X-markings, data values *used* in X-runs and variables associated with a transition.

**Definition 3.** *For* <sup>N</sup> = (P, T, F, *Var* ) *a UDPN,* <sup>X</sup>*-marking <sup>m</sup>,* <sup>t</sup> <sup>∈</sup> <sup>T</sup>*, and* <sup>X</sup>*-run* <sup>ρ</sup> <sup>=</sup> {(ci, ti, πi)}|ρ|*, we define*


With this we state the first main result of this paper, which provides a bound on witnesses of Q, Q<sup>+</sup>-reachability, and is proved in Sect. 5.

**Theorem 1.** *For* <sup>X</sup> ∈ {Q, <sup>Q</sup><sup>+</sup>}*, if an* <sup>X</sup>*-marking <sup>f</sup> is* <sup>X</sup>*-reachable from an initial* X*-marking i, then there is an* X*-run* ρ *such that i* <sup>ρ</sup> −→<sup>X</sup> *f and* |*dval*(ρ)|≤|*dval*(*i*)∪ *dval*(*f*)| + 1 + max<sup>t</sup>∈<sup>T</sup> (|*vars*(t)|)*.*

Using the above bound, we obtain a polynomial time algorithm for Qreachability, as detailed in Sect. 6.

**Theorem 2.** *Given* <sup>N</sup> = (P, T, F, *Var* ) *a UDPN and two* <sup>Q</sup>*-markings <sup>i</sup>, <sup>f</sup>, deciding if <sup>f</sup> is* <sup>Q</sup>*-reachable from <sup>i</sup> in* <sup>N</sup> *is in polynomial time.*

Finally, we consider continuous, i.e., Q<sup>+</sup>-reachability for UDPN. We adapt the techniques used for Q<sup>+</sup>-reachability of Petri nets without data from [11,12] to the setting with data, and obtain a characterization of Q<sup>+</sup>-reachability for UDPN in Sect. 7.1. Finally, in Sect. 7.3, we show how the characterization can be combined with the above bound and compression techniques from [22] to obtain a polynomial sized system of linear equations with implications over Q<sup>+</sup>. To do so, we require a slight transformation of the net which is described in Sect. 7.2. This leads to our headline result, stated below.

**Theorem 3 (Continuous reachability for UDPN).** *Given a UDPN* N = (P, T, F, *Var* ) *and two* Q<sup>+</sup>*-markings i, f, deciding if f is* Q<sup>+</sup>*-reachable from i in* N *is in polynomial time.*

The rest of this paper is dedicated to proving these theorems. First, we present an equivalent formulation via matrices, which simplifies the technical arguments.

#### **4 Equivalent Formulation via Matrices**

From now on, we restrict X to a symbol denoting Q or Q<sup>+</sup>. We formulate the definitions presented earlier in terms of matrices, since defining object such as X-marking as functions is intuitive to define but difficult to operate upon.

In the following, we abuse the notation and use the same names for objects as well as matrices representing them. We remark that this is safe as all arithmetic operations on objects correspond to matching operations on matrices.

An <sup>X</sup>-marking *<sup>m</sup>* is a <sup>P</sup> <sup>×</sup> <sup>D</sup> matrix <sup>M</sup>, where <sup>∀</sup><sup>p</sup> <sup>∈</sup> P, <sup>∀</sup><sup>α</sup> <sup>∈</sup> <sup>D</sup>, M(p, α) = *m*(p, α). As a *finite representation*, we keep only a P × *dval*(*m*) matrix of nonzero columns. For a transition t ∈ T, we represent F(t, •), F(•, t) as P × *Var* matrices. Note that (t, •) is not the position in the matrix, but is part of the name of the matrix; its entry at (i, j) ∈ P × *Var* is given by F(t, •)(i, j). For a place <sup>p</sup> <sup>∈</sup> *row*(F(t, •)), the row <sup>F</sup>(t, •)(p, •) is a vector in <sup>N</sup>*Var* , given by an equation F(•, t)(p, •)(x) = F(p, t)(x) for p ∈ P, t ∈ T,x ∈ *Var* . Similarly, Δ(t) is a P × *Var* matrix with Δ(t)(p, x) = F(t, •)(p, x) − F(•, t)(p, x) for t ∈ T,p ∈ P, and x ∈ *Var* . Although, both Δ(t) and F(•, t) are defined as P × *Var* matrices, only the columns for variables in *vars*(t) may be non-zero, so often we will iterate only over *vars*(t) instead of *Var* .

Finally, we capture a mode <sup>π</sup> : *Var* <sup>→</sup> <sup>D</sup> as a *Var* <sup>×</sup> <sup>D</sup> permutation matrix P. Although P may not be a square matrix, we abuse notation and call them permutation matrices. P basically represents assignment of variables in *Var* to data values just like π does. An entry of 1 represents that the corresponding variable is assigned corresponding data value in mode π. Thus, for each mode <sup>π</sup> : *Var* <sup>→</sup> <sup>D</sup> there is a permutation matrix <sup>P</sup>π, such that for all <sup>x</sup> <sup>∈</sup> *Var* , <sup>α</sup> <sup>∈</sup> <sup>D</sup>, Pπ(x, α) = 1 if π(x) = α, and Pπ(x, α) = 0 otherwise. Formulating a mode as a permutation matrix has the advantage that <sup>Δ</sup>(t) ◦ <sup>π</sup>−<sup>1</sup> is captured by <sup>Δ</sup>(t)· Pπ. *Example 2.* In the UDPN <sup>N</sup><sup>1</sup> from Example 1, if <sup>D</sup> <sup>=</sup> {red, blue, green, black} then the initial marking *i* can be represented by the matrix *i* below and the function Δ(t) by the matrix Δ(t)


If we fire transition t with the assignment x = blue, y = green, z = black, we get the following net depicted below (left), with marking *f* (below center). The permutation matrix corresponding to the mode of fired transition is given by P matrix on the right. Note that the matrix *f* − *i* is indeed the matrix Δ(t) · P.

Using the representations developed so far we can represent an X-run ρ as {(ci, ti,Pi)}|ρ<sup>|</sup> where (ci, ti,Pi) denotes the i th X-step fired with coefficient c<sup>i</sup> using transition t<sup>i</sup> with a mode corresponding to permutation matrix Pi. The sum of the matrices (<sup>|</sup>ρ<sup>|</sup> <sup>i</sup>=1 ciΔ(ti)· Pi) gives us the effect of the run i.e. Δ(ρ) = *<sup>f</sup>* <sup>−</sup> *<sup>i</sup>* where *<sup>i</sup>* <sup>ρ</sup> −→<sup>X</sup> *<sup>f</sup>* . Effect of an <sup>X</sup>-run <sup>ρ</sup> on a data value <sup>α</sup> is <sup>Δ</sup>(ρ)(•, α). Also, for an <sup>X</sup>-run <sup>ρ</sup> <sup>=</sup> {(ci, ti,Pi)}|ρ|, define kρ <sup>=</sup> {(kci, ti,Pi)}|ρ<sup>|</sup> where <sup>k</sup> <sup>∈</sup> <sup>X</sup><sup>+</sup>.

# **5 Bounding Number of Data Values Used in** Q*,* Q**<sup>+</sup>-run**

We now prove the first main result of the paper, namely, Theorem 1, which shows a linear upper bound on the number of data values required in a Q<sup>+</sup>-run and a Q-run. Theorem 1 is an immediate consequence of the following lemma, which states that if more than a linearly bounded number of data values are used in a Q or Q<sup>+</sup> run, then there is another such run in which we use at least one less data value.

**Lemma 1.** *Let* <sup>X</sup> ∈ {Q, <sup>Q</sup><sup>+</sup>}*. If there exists an* <sup>X</sup>*-run* <sup>σ</sup> *such that <sup>i</sup>* <sup>σ</sup> −→<sup>X</sup> *f and* <sup>|</sup>*dval*(σ)<sup>|</sup> <sup>&</sup>gt; <sup>|</sup>*dval*(*i*)∪*dval*(*f*)|+ 1+max<sup>t</sup>∈<sup>T</sup> (|*vars*(t)|)*, then there exists an* <sup>X</sup>*-run* ρ *such that i* <sup>ρ</sup> −→<sup>X</sup> *f and* |*dval*(ρ)|≤|*dval*(σ)| − 1*.*

By repeatedly applying this lemma, Theorem 1 follows immediately. The rest of this section is devoted to proving this lemma. The central idea is to take any Q or Q<sup>+</sup>-run between *i*, *f* and transform it to use at least one less data value.

#### **5.1 Transformation of an** X**-run**

The transformation which we call *decrease* is defined as a combination of two separate operations on an X-run; we name them *uniformize* and *replace* and denote them by U and R respectively.


The intuition behind the decrease operation is that we would like to take two data values α and β used in the run such that effect on both of them is **0** (they exists as the effect on every data value not present in the initial of final configuration is **0**) and replace usage of α by β. However, such a replacement can only be done if both data are not used together in a single step (indeed, a mode π cannot assign the same data values to two variables). Unfortunately we cannot guarantee the existence of such a β that may replace α globally. We circumvent this by applying the *replace* operation separately for every step, replacing α with different data values in different steps.

But such a transformation would not preserve the effect of the run. To repair this aspect we uniformize i.e. guarantee that the final effect after replacing α by other data values is equal for every datum that is used to replace α. As the effect on α was **0** then if we split it uniformly it adds **0** to effects of data replacing α, which is exactly what we want. We now formalize this intuition below.

**The Uniformize Operator.** By c we denote an operator of concatenation of two sequences. Although the data set D is unordered, the following definitions require access to an arbitrary but fixed linear order on its elements. The definition of the *uniformize* operator needs another operator to act on an X-step, which we call *rotate* and denote by *rot*.

**Definition 4.** *For a non-empty finite set of data values* <sup>E</sup> <sup>⊂</sup> <sup>D</sup> *and an* <sup>X</sup>*-step,* <sup>ω</sup> = (c, t,P)*, define rot*(E, ω)=(c, t,P ) *where* P *is obtained from* P *as follows.*

*–* <sup>∀</sup><sup>α</sup> <sup>∈</sup> *col*(P) \ <sup>E</sup>*,* <sup>P</sup> (•, α) = P(•, α)*. –* <sup>∀</sup><sup>α</sup> <sup>∈</sup> <sup>E</sup>*,* <sup>P</sup> (•, α) = <sup>P</sup>(•, nextE(α))*, where* nextE(α) = min({<sup>β</sup> <sup>∈</sup> <sup>E</sup> <sup>|</sup> β>α}) *if* |{<sup>β</sup> <sup>∈</sup> <sup>E</sup> <sup>|</sup> β>α}| <sup>&</sup>gt; <sup>0</sup> *and* min(E) *otherwise.*

For a fixed set <sup>E</sup>, we can repeatedly apply *rot*(E, •) operation on an <sup>X</sup>-step, which we denote by *rot*<sup>k</sup>(E, ω), where k is the number of times we applied the operation (for example: *rot*<sup>2</sup>(E, ω) = *rot*(E,(*rot*(E, ω))).

**Definition 5.** *For a finite and non-empty set of data values* <sup>E</sup> <sup>⊂</sup> <sup>D</sup> *and an* <sup>X</sup>*-step* <sup>ω</sup> = (c, t,P)*, we define uniformize as follows*

 $\mathcal{U}(\mathbb{E},\omega) = \operatorname{rot}^0(\mathbb{E},\frac{\omega}{|\mathbb{E}|})$   $\bigcirc \operatorname{rot}^1(\mathbb{E},\frac{\omega}{|\mathbb{E}|})$   $\bigcirc \operatorname{rot}^2(\mathbb{E},\frac{\omega}{|\mathbb{E}|})$   $\bigcirc \dots$   $\bigcirc \operatorname{rot}^{|\mathbb{E}|-1}(\mathbb{E},\frac{\omega}{|\mathbb{E}|})$ .

An important property of uniformize is its effect on data values.

**Lemma 2.** *For a finite and non-empty set of data values* <sup>E</sup> <sup>⊂</sup> <sup>D</sup> *and an* <sup>X</sup>*-step* <sup>ω</sup> = (c, t,P)*, <sup>i</sup>* <sup>ω</sup> −→Q<sup>+</sup> *f, if i* <sup>U</sup>(E,ω) −−−−→ *<sup>f</sup> , then 1.* <sup>∀</sup><sup>α</sup> <sup>∈</sup> *dval*(ω)\E*, <sup>f</sup>* (•, α) − *i* (•, α) = *f*(•, α) − *i*(•, α) 

*2.* <sup>∀</sup><sup>α</sup> <sup>∈</sup> <sup>E</sup>, , *<sup>f</sup>* (•, α) − *i* (•, α) = *<sup>β</sup>*∈E(*f*(•,β)−*i*(•,β)) <sup>|</sup>E<sup>|</sup> *.*

This lemma tells us the effect of the run on the initial marking is equalized for data values in <sup>E</sup> by the <sup>U</sup> operation, and is unchanged for the other data values.

**The Replace Operator.** To define the *replace* operator it is useful to introduce swapα,β(P) which exchanges columns α and β in the matrix P.

**Definition 6.** *For a finite set of data values* <sup>E</sup>*, an* <sup>X</sup>*-step* <sup>ω</sup> = (c, t,P)*, and* <sup>α</sup> <sup>∈</sup> <sup>E</sup> *we define replace as follows*

$$\mathcal{R}(\alpha,\mathbb{E},\omega) = \begin{cases} (c,t,\mathcal{P}) & \text{if } (F(t,\bullet)\cdot\mathcal{P})(\bullet,\alpha) = (F(\bullet,t)\cdot\mathcal{P})(\bullet,\alpha) = \emptyset \\ (c,t,swap\_{\alpha,\beta}(\mathcal{P})) & \text{else, if } \beta \text{ is the smallest datum in } \mathbb{E}\text{ s.t.,} \\ & (F(t,\bullet)\cdot\mathcal{P})(\bullet,\beta) = (F(\bullet,t)\cdot\mathcal{P})(\bullet,\beta) = \mathbf{0} \\ \text{undefined} & \text{otherwise.} \end{cases}$$

After applying the *replace* operation α is no longer used in the run, which reduces the number of data values used in the run. Observe that *replace* can not be always applied to an X-step. It requires a zero column labelled with an element from E in the permutation matrix corresponding to the X-step.

**The Decrease Transformation.** Finally, we define the transformation on an X-run between two markings which we call *decrease* and denote by *dec*.

**Definition 7.** *For two* X*-markings i, f, and an* X*-run* σ *such that i* <sup>σ</sup> −→<sup>X</sup> *f and* <sup>|</sup>*dval*(σ)<sup>|</sup> <sup>&</sup>gt; <sup>|</sup>*dval*(*i*) <sup>∪</sup> *dval*(*f*)<sup>|</sup> + 1 + max<sup>t</sup>∈<sup>T</sup> (|*vars*(t)|)*, let* {α} ∪ <sup>E</sup> <sup>=</sup> *dval*(σ) \ (*dval*(*i*) <sup>∪</sup> *dval*(*f*)) *and* <sup>α</sup> <sup>∈</sup> <sup>E</sup>*. We define decrease by, dec*(E, α, σ) =

<sup>U</sup>(E, <sup>R</sup>(α,E, σ(1))) U <sup>c</sup> (E, <sup>R</sup>(α,E, σ(2))) <sup>c</sup> ... U <sup>c</sup> (E, <sup>R</sup>(α,E, σ(|σ|))). *where* σ(j) *denotes the* jth X*-step of* σ*.*

Observe that the required size of *dval*(σ) guarantees existence of a <sup>β</sup> <sup>∈</sup> <sup>E</sup> which can be replaced with α, for every application of the R operation. Note that the exchanged data value β could be different for each step. Finally, we can analyze the *decrease* transformation and show that if the original run allows for the *decrease* transformation (as given in the above definition), then after the application of it, the resulting sequence of transitions is a valid run of the system.

**Lemma 3.** *Let* σ *be an* X*-run such that i* <sup>σ</sup> −→<sup>X</sup> *f and* |*dval*(σ)| > |*dval*(*i*) ∪ *dval*(*f*)<sup>|</sup> + 1 + maxt∈<sup>T</sup> (|*dval*(t)|)*. Let* <sup>α</sup> <sup>∈</sup> *dval*(σ) \ (*dval*(*i*) <sup>∪</sup> *dval*(*f*)) *and* <sup>E</sup> <sup>=</sup> *dval*(σ) \ (*dval*(*i*) <sup>∪</sup> *dval*(*f*) ∪ {α})*. Then for* <sup>ρ</sup> <sup>=</sup> *dec*(E, α, σ)*, we obtain <sup>i</sup>* <sup>ρ</sup> −→<sup>X</sup> *f*.

*Proof.* Suppose σ = σ1σ<sup>2</sup> ...σ<sup>l</sup> where each σ<sup>j</sup> = (c<sup>j</sup> , t<sup>j</sup> ,P<sup>j</sup> ), for 1 ≤ j ≤ l is an <sup>X</sup>-step. Then <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup>1<sup>c</sup> ... <sup>c</sup> <sup>ρ</sup>l, where each <sup>ρ</sup><sup>j</sup> is an <sup>X</sup>-run defined by <sup>ρ</sup><sup>j</sup> <sup>=</sup> <sup>U</sup>(E, <sup>R</sup>(α,E, σ<sup>j</sup> )). It will be useful to identify intermediate <sup>X</sup>-markings

$$i = m\_0 \xrightarrow{\sigma\_1} m\_1 \xrightarrow{\sigma\_2}\_{\mathbb{X}} m\_2 \xrightarrow{\sigma\_3}\_{\mathbb{X}} \dots \xrightarrow{\sigma\_l}\_{\mathbb{X}} m\_l = f \tag{1}$$

*i* = *m*- o <sup>U</sup>(E,R(α,E,σ1)) −−−−−−−−−−→<sup>Q</sup> *<sup>m</sup>*- 1 <sup>U</sup>(E,R(α,E,σ2)) −−−−−−−−−−→<sup>Q</sup> *<sup>m</sup>*- <sup>2</sup> *...* <sup>U</sup>(E,R(α,E,σ*l*)) −−−−−−−−−−→<sup>Q</sup> *<sup>m</sup>*- <sup>l</sup> = *f* -(2)

We split the proof: first we show that *f* = *f* and then ρ is X-fireable from *i*.

**Step 1: Showing that the final markings reached are the same.** We prove a stronger statement which implies that *f* = *f* , namely:

**Claim 1.** *For all* 0 ≤ j ≤ l*,*

*1. m* <sup>j</sup> (•, α) = *0 2.* ∀γ ∈ *dval*(*i*) ∪ *dval*(*f*)*, m* <sup>j</sup> (•, γ) = *m*<sup>j</sup> (•, γ) *3.* <sup>∀</sup><sup>γ</sup> <sup>∈</sup> <sup>E</sup> *<sup>m</sup>* <sup>j</sup> (•, γ) = <sup>1</sup> |E| <sup>δ</sup>∈E∪{α} *<sup>m</sup>*<sup>j</sup> (•, δ) .

The proof is obtained by induction on j. Intuitively, point 1 holds as we shift effects on α to β, point 2 holds as the transformation does not touch γ ∈ *dval*(*i*) ∪ *dval*(*f* ). The last and most complicated point follows from the fact that the number of tokens consumed and produced along each segment <sup>U</sup>(E,R(α,E,σ*<sup>j</sup>* )) −−−−−−−−−−→ is the same as for <sup>σ</sup><sup>j</sup> , but uniformized over <sup>E</sup>.

**Step 2: Showing that** ρ **is an** X**-run.** If X = Q then the run ρ is fireable, as any Q-run is fireable, so in this case this step is trivial. The case when X = Q<sup>+</sup> is more involved. As we know from Claim 1, each m <sup>j</sup> is a Q<sup>+</sup>-marking, so it suffices to prove that for every j, *m* j <sup>U</sup>(E,R(α,E,σ*<sup>j</sup>* )) −−−−−−−−−−→<sup>Q</sup><sup>+</sup> *<sup>m</sup>* <sup>j</sup>+1. Consider a data vector of tokens consumed along the <sup>Q</sup><sup>+</sup>-run <sup>U</sup>(E, <sup>R</sup>(α,E, σ<sup>j</sup> )). If we show that it is smaller than or equal to *m* <sup>j</sup> (component-wise), then we can conclude that <sup>U</sup>(E, <sup>R</sup>(α,E, σ<sup>j</sup> )) is indeed <sup>Q</sup><sup>+</sup>-fireable from *<sup>m</sup>* <sup>j</sup> . To show this, we examine the consumed tokens for each datum γ separately. There are three cases:


by step σ<sup>j</sup> . But we know that, it is smaller than *m*<sup>j</sup> (•, γ) and concluding smaller than *m* <sup>j</sup> (•, γ). The last inequality is true as *m*<sup>j</sup> (•, γ) = *m* <sup>j</sup> (•, γ) according to Claim 1.

(iii) <sup>γ</sup> <sup>∈</sup> <sup>E</sup>. Let <sup>ω</sup> be a triple (c<sup>j</sup> , F(•, t<sup>j</sup> ),P<sup>j</sup> ) where (c<sup>j</sup> , t<sup>j</sup> ,P<sup>j</sup> ) = <sup>σ</sup><sup>j</sup> . ω simply describes tokens consumed by σ<sup>j</sup> . We slightly overload the notation and treat a triple ω like a step, where F(•, t<sup>j</sup> ) represents a transition " " for which F(•, ) = F(•, t<sup>j</sup> ) and F( , •) is a zero matrix. We calculate the vector of consumed tokens with data value γ as follows: consumed(•, γ) =

$$\frac{1}{|\mathbb{E}|} \sum\_{k=0}^{|\mathbb{E}|-1} \Delta(rot^k(\mathbb{E}, \mathcal{R}(\alpha, \mathbb{E}, \omega)))(\bullet, \gamma) = \frac{1}{|\mathbb{E}|} \sum\_{k=0}^{|\mathbb{E}|} \Delta(rot^k(\mathbb{E} \cup \{\alpha\}, \omega))(\bullet, \gamma)$$

the first equality is from definition and the second by the *replace* operation,

$$=\frac{c\_j}{|\mathbb{E}|}\sum\_{k=0}^{|\mathbb{E}|}(rot^k(\mathbb{E}\cup\{\alpha\},(1,F(\bullet,t\_j),\mathcal{P}\_j)))(\bullet,\gamma) = \frac{c\_j}{|\mathbb{E}|}\sum\_{\delta\in\mathbb{E}\cup\{\alpha\}}(F(\bullet,t\_j)\cdot\mathcal{P}\_j)(\bullet,\delta)$$

Further, observe that as σ<sup>j</sup> can fired in *m*<sup>j</sup>

$$c\_j(F(\bullet, t\_j) \cdot \mathcal{P}\_j)(\bullet, \delta) \le \mathfrak{m}\_j(\bullet, \delta) \text{ for all } \delta \in \mathbb{D},$$

summing up over <sup>δ</sup> <sup>∈</sup> <sup>E</sup> ∪ {α} and multiplying with <sup>1</sup> <sup>|</sup>E<sup>|</sup> we get

$$\frac{1}{|\mathbb{E}|}c\_j \sum\_{\delta \in \mathbb{E} \cup \{\alpha\}} (F(\bullet, t\_j) \cdot \mathcal{P}\_j)(\bullet, \delta) \le \frac{1}{|\mathbb{E}|} \sum\_{\delta \in \mathbb{E} \cup \{\alpha\}} \mathfrak{m}\_j(\bullet, \delta) = \mathfrak{m}'\_j(\delta, \gamma),$$

where the last equality comes from Claim 1 point 3. Combining inequalities we get consumed(•, γ) ≤ *m* <sup>i</sup>(•, γ).

*Proof (of Lemma* 1*).* Now the proof of Lemma 1 (and hence Theorem 1) follow immediately, since we can use the *decrease* transformation, to decrease the number of data values required in an <sup>X</sup>-run. We simply take <sup>α</sup> <sup>∈</sup> *dval*(σ) \ (*dval*(*i*) <sup>∪</sup> *dval*(*<sup>f</sup>* )) and <sup>E</sup> <sup>=</sup> *dval*(σ) \ (*dval*(*i*) <sup>∪</sup> *dval*(*<sup>f</sup>* )) \ {α}. Next, let ρ = *dec*(E, α, σ). Due to Lemma 3 we know that *i* <sup>ρ</sup> −→<sup>X</sup> *f* . Moreover, observe that *dval*(ρ) ⊆ *dval*(σ). But in addition, α ∈ *dval*(ρ) as due to the one of properties of the *decrease* operation α does not participate in the run ρ. So *dval*(ρ) ⊂ *dval*(σ). Therefore |*dval*(ρ)|≤|*dval*(σ)| − 1.

# **6** Q**-reachability is in PTime**

We recall the definition of histograms from [22].

**Definition 8.** *A histogram* <sup>M</sup> *of order* <sup>q</sup> <sup>∈</sup> <sup>Q</sup> *is a Var* <sup>×</sup> <sup>D</sup> *matrix having nonnegative rational entries such that,*

*1.* <sup>α</sup>∈*col*(M) <sup>M</sup>(x, α) = <sup>q</sup> *for all* <sup>x</sup> <sup>∈</sup> *row*(M)*. 2.* <sup>x</sup>∈*row*(M) <sup>M</sup>(x, α) <sup>≤</sup> <sup>q</sup> *for all* <sup>α</sup> <sup>∈</sup> *col*(M)*.*

A permutation matrix is a histogram of order 1.

In the following lemma, we state two properties of histograms. We say that a histogram of order a is an *[*a*]-histogram* if the histogram has only {0, a} entries.

**Lemma 4.** *Let* H, H1, H2, .., H<sup>n</sup> *be histograms of order* q, q1, q2, ..., q<sup>n</sup> *respectively and of same row dimensions then (i)* <sup>n</sup> <sup>i</sup>=1 <sup>H</sup><sup>i</sup> *is a histogram of order* <sup>n</sup> <sup>i</sup> qi*, (ii)* H *can be decomposed as a sum of [*ai*]-histograms such that* <sup>i</sup> a<sup>i</sup> = q*.*

Using histograms we define a representation Hist(ρ) for an X-run ρ, which captures <sup>Δ</sup>(ρ). From an <sup>X</sup>-run <sup>ρ</sup> <sup>=</sup> {(c<sup>j</sup> , t<sup>j</sup> ,P<sup>j</sup> )}|ρ<sup>|</sup> we obtain Hist(ρ) as follows. For all transitions t ∈ T, define the set I<sup>t</sup> = {j ∈ [1..|ρ|]| t<sup>j</sup> = t}. Then calculate the matrix H<sup>t</sup> = <sup>i</sup>∈I*<sup>t</sup>* <sup>c</sup>iPi. Observe that since permutation matrices are histograms and histograms are closed under scalar multiplication and addition, H<sup>t</sup> is a histogram. If I<sup>t</sup> is empty, then H<sup>t</sup> is simply the null matrix. We define Hist(ρ) as a mapping from T to histograms such that t is mapped to Ht.

Analogous to an <sup>X</sup>-run we can represent Hist(ρ) simply as {(t<sup>j</sup> , H<sup>t</sup>*<sup>j</sup>* )}, unlike an X-run we don't indicate the length of the sequence since it is dependent on the net and not the individual run itself.

**Proposition 1.** *Let* <sup>N</sup> = (P, T, F, *Var* ) *be a UDPN, <sup>i</sup>*, *<sup>f</sup>* <sup>X</sup>*-markings, and* <sup>σ</sup> *an* X*-run such that i* <sup>σ</sup> −→<sup>X</sup> *f. Then for each* t ∈ T *there exists* H<sup>t</sup> *such that:*

*1. f* − *i* = <sup>t</sup>∈<sup>T</sup> <sup>Δ</sup>(t) · <sup>H</sup>t, *2. col*(Ht) ⊆ *dval*(σ) *for every* t ∈ T.

**A PTime Procedure.** We start by observing that from any Q-marking *i*, every <sup>Q</sup>-step (c, t,P) is fireable and every <sup>Q</sup> run is fireable. This follows from the fact that rationals are closed under addition, thus *i* + c · F(•, t) · P is a marking in <sup>M</sup><sup>Q</sup>. Thus if we have to find a <sup>Q</sup>-run <sup>ρ</sup> <sup>=</sup> {(c<sup>j</sup> , t<sup>j</sup> ,P<sup>j</sup> )}|ρ<sup>|</sup> between two <sup>Q</sup>-markings, *<sup>i</sup>*, *<sup>f</sup>* it is sufficient to ensure that *<sup>f</sup>* <sup>−</sup> *<sup>i</sup>* <sup>=</sup> <sup>|</sup>ρ<sup>|</sup> <sup>j</sup>=1 cjΔ(t<sup>j</sup> ) · P<sup>j</sup> . Thus for a Q-run all that matters is the difference in markings caused by the <sup>Q</sup>-run which is captured succinctly by Hist(ρ) = {t<sup>j</sup> , H<sup>t</sup>*<sup>j</sup>* }. This brings us to our characterization of Q-run.

**Lemma 5.** *Let* <sup>N</sup> = (P, T, F, *Var* ) *be a UDPN, a marking <sup>f</sup> is* <sup>Q</sup>*-reachable from <sup>i</sup> iff there exists set* <sup>E</sup> *of size bounded by* <sup>|</sup>E|≤|*dval*(*i*) <sup>∪</sup> *dval*(*f*)<sup>|</sup> +1+ max<sup>t</sup>∈<sup>T</sup> (|*vars*(t)|) *and a histogram* H<sup>t</sup> *for each* t ∈ T *such that f*−*i* = <sup>t</sup>∈<sup>T</sup> <sup>Δ</sup>(t)· <sup>H</sup><sup>t</sup> *and* <sup>∀</sup><sup>t</sup> <sup>∈</sup> <sup>T</sup> *col*(Ht) <sup>⊆</sup> <sup>E</sup>.

Using this characterization we can write a system of linear inequalities to encode the condition of Lemma 5. Thus, we obtain our second main result, namely, Theorem 2, with detailed proofs in [27].

# **7** Q**<sup>+</sup>-reachability is in PTime**

Finally, we turn to Q<sup>+</sup>-reachability for UDPNs and to the proof of Theorem 3. At a high level, the proof is in three steps. We start with a characterization of Q<sup>+</sup> reachability in UDPNs. Then we present a polytime reduction of the continuous reachability problem to the same problem but for a special subclass of UDPN, called loop-less nets. Finally, we present how to encode the characterization for loop-less nets into a system of *linear equations with implications* to obtain a polytime algorithm for continuous reachability in UDPNs.

#### **7.1 Characterizing** Q**<sup>+</sup>-reachability**

We begin with a definition. For an X-run we introduce the notion of the pre and post sets of <sup>X</sup>−run. For an <sup>X</sup>-run, <sup>ρ</sup> <sup>=</sup> {(ci, ti,Pi)}|ρ<sup>|</sup> we define Pre(ρ) = {(p, α)| ∃ ti, ∃ x : F(p, ti)(x) < 0 ∧ Pi(x, α)=1}. We also define P ost(ρ) = {(p, α)| ∃ ti, ∃ x : F(ti, p)(x) > 0 ∧ Pi(x, α)=1}. Intuitively, Pre(ρ), P ost(ρ) denote the set of (p, α) (place, data value) pairs describing tokens that are consumed, produced respectively by the run ρ.

Throughout this section, by a marking we denote a Q<sup>+</sup>-marking.

**Lemma 6.** *Let* N = (P, T, F, *Var* ) *be an UDPN and i*, *f are markings. For any* Q<sup>+</sup>*-run* σ *such that i* <sup>σ</sup> −→<sup>Q</sup><sup>+</sup> *f there exist markings i and f (possibly on a different run) such that*


*Remark 1.* If in conditions 1 and 3 we drop the requirement on the number of steps then the five conditions still imply continuous reachability.

Note that if there exist markings *i* and *f* and Q<sup>+</sup> -runs ρ, ρ , ρ such that *i* <sup>ρ</sup> −→<sup>Q</sup><sup>+</sup> *i* ,*i* <sup>ρ</sup> −→<sup>Q</sup><sup>+</sup> *f* , *f* <sup>ρ</sup> −→<sup>Q</sup><sup>+</sup> *<sup>f</sup>* then there is a <sup>Q</sup><sup>+</sup>-run <sup>σ</sup> such that *i* <sup>σ</sup> −→<sup>Q</sup><sup>+</sup> *f* . The above characterization and its proof are obtained by adapting to the data setting, the techniques developed for continuous reachability in Petri nets (without data) in [11] and [12].

#### **7.2 Transforming UDPN to Loop-less UDPN**

For a UDPN N = (P, T, F, *Var* ), we construct a UDPN N which is polynomial in the size of <sup>N</sup> and the <sup>Q</sup><sup>+</sup>-reachability problem is equivalent. We define P rePlace(t) = {p ∈ P|∃v ∈ *Var* s.t. F(p, t)(v) > 0} and P ostPlace(t) = {p ∈ P|∃v ∈ *Var* s.t. F(t, p)(v) > 0}, where t ∈ T. The essential property of the transformed UDPN is that for every transition the sets of PrePlace and PostPlace do not intersect. A UDPN N = (P, T, F, *Var* ) is said to be *loop-less* if for all t ∈ T, P rePlace(t) ∩ P ostPlace(t) = ∅.

Any UDPN can easily be transformed in polynomial time into a loop-less UDPN such that Q<sup>+</sup>-reachability is preserved, by doubling the number of places and adding intermediate transitions. Formally, For every net N and two markings *i*, *f* in polynomial time one can construct a loop-less net N and two markings *i* , *f* such that *i* −→Q<sup>+</sup> *f* in the net N iff *i* −→Q<sup>+</sup> *f* in N . Now, the following lemma which describes a property of loop-less nets will be crucial for our reachability algorithm:

**Lemma 7.** *In a loop-less net, for markings i, f, if there exist a histogram* H*, and a transition t* <sup>∈</sup> *T such that <sup>i</sup>* <sup>+</sup> <sup>Δ</sup>(t) · <sup>H</sup> <sup>=</sup> *<sup>f</sup>, then there exist a* <sup>Q</sup><sup>+</sup>*-run* <sup>ρ</sup> *such that i* <sup>ρ</sup> −→<sup>Q</sup><sup>+</sup> *f.*

#### **7.3 Encoding** Q**<sup>+</sup>-reachability as Linear Equations with Implications**

Linear equations with implications, as we use them, are defined in [23], but were introduced in [12]. A system of linear equations with implications, also denoted a =⇒ system, is a finite set of linear inequalities over the same variables, plus a finite set of implications of the form x > 0 =⇒ y > 0, where x, y are variables appearing in the linear inequalities.

**Lemma 8** *[12]***.** *The* <sup>Q</sup><sup>+</sup> *solvability problem for a* <sup>=</sup><sup>⇒</sup> *system is in* P T ime*.*

We then reduce the Q<sup>+</sup>-reachability problem to checking the solvability of a system of linear equations with implications, using the characterization established in Lemma 6 in the following lemma.

**Lemma 9.** <sup>Q</sup><sup>+</sup>*-reachability in a UDPN* <sup>N</sup> = (P, T, F, *Var* ) *between markings i*, *f can be encoded as a set of linear equations with implications in P-time.*

Finally, we obtain Theorem 3 as a consequence of Lemmas 8 and 9.

# **8 Conclusion**

In this paper, we provided a polynomial time algorithm for continuous reachability in UDPN, matching the complexity for Petri nets without data. This is in contrast to problems such as discrete coverability, termination, where Petri nets with and without data differ enormously in complexity, and to (discrete) reachability, where decidability is still open. As future work, we aim to implement the continuous reachability algorithm developed here, to build the first tool for discrete coverability in UDPN on the lines of what has been done for Petri nets without data. The main obstacle will be performance evaluation due to lack of benchmarks for UDPNs. Another interesting avenue for future work would be to tackle continuous reachability for Petri nets with ordered data, which would allow us to analyze continuous variants of Timed Petri nets.

**Acknowledgments.** We thank the anonymous reviewers for their careful reading and their helpful and insightful comments.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Optimal Satisfiability Checking for Arithmetic** *µ***-Calculi**

Daniel Hausmann(B) and Lutz Schr¨oder

Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany *{*daniel.hausmann,lutz.schroeder*}*@fau.de

**Abstract.** The coalgebraic *<sup>µ</sup>*-calculus provides a generic semantic framework for fixpoint logics with branching types beyond the standard relational setup, e.g. probabilistic, weighted, or game-based. Previous work on the coalgebraic *µ*-calculus includes an exponential time upper bound on satisfiability checking, which however requires a well-behaved set of tableau rules for the next-step modalities. Such rules are not available in all cases of interest, in particular ones involving either integer weights as in the graded *µ*-calculus, or real-valued weights in combination with non-linear arithmetic. In the present work, we prove the same upper complexity bound under more general assumptions, specifically regarding the complexity of the (much simpler) satisfiability problem for the underlying *one-step logic*, roughly described as the nesting-free next-step fragment of the logic. The bound is realized by a generic global caching algorithm that supports on-the-fly satisfiability checking. Example applications include new exponential-time upper bounds for satisfiability checking in an extension of the graded *µ*-calculus with polynomial inequalities (including positive Presburger arithmetic), as well as an extension of the (two-valued) probabilistic *µ*-calculus with polynomial inequalities.

#### **1 Introduction**

Modal fixpoint logics are a well-established tool in the temporal specification, verification, and analysis of concurrent systems. One of the most expressive logics of this type is the modal μ-calculus [2,3,20], which features explicit least and greatest fixpoint operators; roughly speaking, these serve to specify liveness properties (least fixpoints) and safety properties (greatest fixpoints), respectively. Like most modal logics, the modal μ-calculus is traditionally interpreted over relational models such as Kripke frames or labelled transition systems. The growing interest in more expressive models where transitions are governed, e.g., by probabilities, weights, or games has sparked a commensurate growth of temporal logics and fixpoint logics interpreted over such systems; prominent examples include probabilistic μ-calculi [5,17,24], the alternating-time μ-calculus [1], and the monotone μ-calculus, which contains Parikh's game logic [28]. The graded μ-calculus [21] features next-step modalities that count successors; it is standardly interpreted over Kripke frames but, as pointed out by D'Agostino and Visser [6], graded modalities are more naturally interpreted over so-called multigraphs, where edges carry integer weights, and in fact this modification leads to better bounds on minimum model size for satisfiable formulas.

Coalgebraic logic [29,34] has emerged as a unifying framework for modal logics interpreted over such more general models. It is based on casting the transition type of the systems at hand as a set functor, and the systems in question as coalgebras for this type functor, following the paradigm of universal coalgebra [31]; additionally, modalities are interpreted as so-called *predicate liftings*. The *coalgebraic* μ*-calculus* [4] caters for fixpoint logics within this framework, and essentially covers all mentioned (two-valued) examples as instances. It has been shown that satisfiability checking in a coalgebraic μ-calculus is in ExpTime, *provided* that one exhibits a set of tableau rules for the modalities, so-called *one-step rules*, that is *tractable* in a suitable sense (an assumption made also in our own previous work on the flat [14] and alternation-free [16] fragments of the coalgebraic μ-calculus). Such rules are known for many important cases, notably including alternating-time logics, the probabilistic μ-calculus even when extended with linear inequalities, and game logic [4,22,36]. There are, however, important cases where such rule sets are currently missing, and where there is in fact little perspective for finding suitable rules. One prominent case of this kind is graded modal logic; further cases arise when logics over systems with non-negative real weights, such as probabilistic systems, are taken beyond linear arithmetic to include polynomial inequalities.

The object of the current paper is to fill this gap by proving a generic ExpTime upper bound for coalgebraic μ-calculi in the absence of tractable sets of modal tableau rules. The method we use instead is to analyse the so-called *one-step satisfiability* problem of the logic on a semantic level – this problem is essentially the satisfiability problem of a very small fragment of the logic, the *onestep logic*, which excludes not only fixpoints, but also nested next-step modalities, with a correspondingly simplified semantics that no longer involves actual transitions. E.g. the one-step logic of the relational μ-calculus is interpreted over models essentially consisting of a set with a distinguished subset, abstracting the successors of a single state that is not itself part of the model. We have applied this principle to satisfiability checking in coalgebraic (next-step) modal logics [35], coalgebraic hybrid logics [26], and reasoning with global assumptions in coalgebraic modal logics [23]. It also appears implicitly in work on automata for the coalgebraic μ-calculus [8], which however establishes only a doubly exponential upper bound in the case without tractable modal tableau rules.

Our main example applications are on the one hand the graded modal μcalculus and its extension with (monotone) polynomial inequalities, including Presburger modalities, i.e. (monotone) linear inequalities, and on the other hand the extension of the (two-valued) probabilistic μ-calculus [4,24] with (monotone) polynomial inequalities. While the graded μ-calculus as such is known to be in ExpTime [21], the other mentioned instances of our result are, to our best knowledge, new. At the same time, our proofs are fairly simple, even compared to specific ones, e.g. for the graded μ-calculus.

Technically, we base our results on an automata-theoretic treatment by means of standard parity automata with singly exponential branching degree (in particular on modal steps), thus precisely enabling the singly exponential upper bound, in contrast to previous work in [8] where the introduced Λ-automata lead to doubly exponential branching on modal steps in the resulting satisfiability games. Our algorithm witnessing the singly exponential time bound is, in fact, a global caching algorithm [11,12], and is able to decide the satisfiability of nodes on-the-fly, that is, possibly before the tableau is fully expanded, thus offering a perspective for practically feasible reasoning. A side result of our approach is a criterion for a polynomial bound on branching in models, which holds in all our examples.

*Organization.* In Sect. 2, we recall the basics of the coalgebraic μ-calculus. We outline our automata-theoretic approach in Sect. 3, and present the global caching algorithm and its runtime analysis in Sect. 4. Soundness and completeness of the algorithm are proved in Sect. 5.

# **2 The Coalgebraic** *µ***-Calculus**

We recall basic definitions in coalgebraic logic [29,34] and the coalgebraic μcalculus [4].

*Syntax.* We fix a *modal similarity type* Λ, that is, a set of modal operators with assigned finite arities, possibly including propositional atoms as nullary modalities. For readability, we restrict the technical development to unary modalities, noting that all proofs generalize to higher arities by just writing more indices; in fact, we will liberally use higher arities in examples. We assume that Λ is closed under duals, i.e., that for each modal operator ♥ ∈ Λ, there is a *dual* ♥ ∈ Λ such that ♥ = ♥ for all ♥ ∈ Λ. Let **V** be an infinite set of *fixpoint variables*. Formulas of the *coalgebraic* μ*-calculus* (over Λ) are given by the grammar

$$
\psi, \phi ::= \bot \mid \top \mid \psi \land \phi \mid \psi \lor \phi \mid \heartsuit \phi \mid X \mid \mu X. \psi \mid \nu X. \psi \qquad \qquad \heartsuit \in \Lambda, X \in \mathbf{V}.
$$

As usual, μ and ν take least and greatest fixpoints, respectively. Negation is not included but can be defined as usual. Throughout, we use η ∈ {μ, ν} as a placeholder for fixpoint operators; we briefly refer to formulas of the form ηX. φ as *fixpoints*. Fixpoint operators *bind* their fixpoint variables, so that we have standard notions of bound and free fixpoint variables; a formula is closed if it contains no free fixpoint variables. We assume w.l.o.g. that all formulas are *clean*, i.e. each fixpoint variable appears in at most one fixpoint operator, and *irredundant*, i.e. each bound variable is used at least once. Moreover, we restrict to *guarded* formulas, in which all occurrences of fixpoint variables are separated by at least one modal operator from their binding fixpoint operator (this is standard although possibly not w.l.o.g. [9]). For ♥ ∈ Λ, we denote by size(♥) the length of a suitable representation of ♥; for natural or rational numbers indexing ♥, we assume binary representation. The *length* |ψ| of a formula is its length over the alphabet {⊥, ,∧,∨} ∪ Λ ∪ **V** ∪ {ηX. | X ∈ **V**}, while the *size* size(ψ) of ψ is defined by counting size(♥) for each ♥ ∈ Λ (and 1 for all other operators). The *alternation depth* ad(ηX.ψ) of a fixpoint ηX.ψ is the maximal depth of nesting of such alternating least and greatest fixpoints in ψ that depend on X, tweaked to be *even* for least fixpoint formulas and *odd* for greatest fixpoint formulas (that is, starting with ad(μX.ψ) = 2 and ad(νX.ψ) = 1 for closed ψ). For a more detailed definition of various flavours of alternation depth, see e.g. [27].

*Semantics.* As indicated above, the branching type of the underlying systems is a parameter of the framework, given by fixing a **Set**-endofunctor T. Elements of T U should be thought of as structured collections over U that serve as collections of successors of states – e.g. in the most basic example, classical relational systems, T is powerset P. Formulas are then interpreted over T*-coalgebras* (C, ξ) consisting of a set C of *states* and a *transition function* ξ : C → T C that assigns a structured collection ξ(x) ∈ T C of successors (and observations) to x ∈ C; e.g. P-coalgebras are just Kripke frames, as they assign a set of successors to each state. We interpret each modal operator ♥ ∈ Λ as <sup>a</sup> <sup>T</sup>*-predicate lifting* [[♥]], that is, a natural transformation [[♥]] : Q→Q◦ <sup>T</sup> *op* where <sup>Q</sup> : **Set***op* <sup>→</sup> **Set** denotes the contravariant powerset functor. Predicate liftings thus are families of functions [[♥]]<sup>U</sup> : Q(U) → Q(T U) satisfying *naturality*, i.e. [[♥]]<sup>U</sup> (<sup>f</sup> <sup>−</sup><sup>1</sup>[A]) = (T f)−<sup>1</sup>[[[♥]]<sup>V</sup> (A)] for <sup>f</sup> : <sup>U</sup> <sup>→</sup> <sup>V</sup> and <sup>A</sup> <sup>⊆</sup> <sup>V</sup> , where f <sup>−</sup><sup>1</sup> denotes preimage. E.g. the standard relational box modality is interpreted by [[-]]<sup>U</sup> (A) = {B ∈ P(U) | B ⊆ A}. For sets U ⊆ V , we write U = V \ U for the *complement* of U in V when V is understood from the context. We require that duality of modal operators is respected, i.e. [[♥]]<sup>U</sup> (A) = [[♥]]<sup>U</sup> A for A ⊆ U. To ensure existence of fixpoints, we require that all [[♥]] are *monotone*, i.e. A ⊆ B ⊆ U implies [[♥]]<sup>U</sup> (A) ⊆ [[♥]]<sup>U</sup> (B).

A *valuation* is a partial function i : **V** → P  (C) that assigns sets i(X) of states to fixpoint variables X. The *extension* [[φ]]<sup>i</sup> ⊆ C of a formula φ in a T-coalgebra (C, ξ) is defined by the expected clauses for propositional operators and

$$\begin{aligned} [\heartsuit \psi]\_i &= \xi^{-1} [\![\heartsuit]\!]\_C (\![\psi]\!]\_i) & \quad & [\mu X. \psi]\_i = \mathtt{LFP}(\![\![\psi]\!]\_i^X) \\ [X]\_i &= i(X) & \quad & [\nu X. \psi]\_i = \mathtt{GFP}(\![\![\psi]\!]\_i^X), \end{aligned}$$

where LFP and GFP compute the least and greatest fixpoints of their argument functions, respectively, where [[ψ]]<sup>X</sup> <sup>i</sup> (A) = [[ψ]]i[X→A] for A ⊆ C, and where (i[X → A])(X) = A and (i[X → A])(Y ) = i(Y ) for Y = X. In particular, the extension is invariant under *unfolding* of fixpoints, i.e. [[ηX. ψ]]<sup>i</sup> = [[ψ[X → ηX. ψ]]]i. For closed formulas ψ, the valuation i is irrelevant, so we write [[ψ]] instead of [[ψ]]i. A state x ∈ C *satisfies* a closed formula ψ (denoted x |= ψ) if x ∈ [[ψ]]. Given a set Z, we define the set Λ(Z) = {♥z |♥∈ Λ, z ∈ Z} of *modal literals* (over Z). A closed formula χ is *satisfiable* if there is a coalgebra (C, ξ) and a state x ∈ C such that x |= χ.

**Example 1.** We now detail several instances of the coalgebraic μ-calculus; for further examples, e.g. the alternating-time μ-calculus, see [4].

1. To obtain the standard modal μ-calculus [19] (which contains CTL as a fragment), we take Λ = {♦, -} ∪ P where P is a set of propositional atoms, seen as nullary modalities. The semantics is captured by T U = P(U) × P(P), so that T-coalgebras are Kripke models, as they assign to each state a set of successors and a set of atoms satisfied in the state. The relevant predicate liftings are

$$\mathbb{E}\left[\bigcirc\big|\_{U}(A) = \{(B,Q) \in TU \mid A \cap B \neq \emptyset\} \quad \|\big|\_{U}(A) = \{(B,Q) \in TU \mid B \subseteq A\}\right]$$

and [[p]]<sup>U</sup> = {(B,Q) ∈ T U | p ∈ Q}, a nullary predicate lifting. Standard example formulas include the CTL-formula AF p = μX.(p∨-X), which states that on all paths, p eventually holds, and the fairness formula νX. μY.((p ∧ ♦X) ∨ ♦Y ), which asserts the existence of a path on which p holds infinitely often.

2. We interpret the *graded* μ*-calculus* [21] over multigraphs [6], i.e. T-coalgebras for the multiset functor T = B, defined by

$$\mathcal{B}(U) = \{ \theta : U \to \mathbb{N} \cup \{\infty\} \} \qquad \mathcal{B}(f)(\theta)(v) = \sum\_{u \in U \mid f(u) = v} \theta(u)$$

for sets U, V and functions <sup>f</sup> : <sup>U</sup> <sup>→</sup> <sup>V</sup> , <sup>θ</sup> : <sup>U</sup> <sup>→</sup> <sup>N</sup> ∪ {∞}. Thus <sup>B</sup>-coalgebras (C, ξ) assign multisets ξ(x) to states x ∈ C, with the intuition that x has y ∈ C as successor with multiplicity m if ξ(x)(y) = m. We use the modal similarity type <sup>Λ</sup> <sup>=</sup> {m, [m] <sup>|</sup> <sup>m</sup> <sup>∈</sup> <sup>N</sup>∪{∞}} and define the predicate liftings

$$\mathbb{E}\left[\langle m\rangle\right]\mathbb{I}\_U(A) = \{\theta \in \mathcal{B}(U) \mid \theta(A) > m\} \quad \text{([m])}\\\mathbb{I}\_U(A) = \{\theta \in \mathcal{B}(U) \mid \theta(\overline{A}) \le m\}$$

for sets U and A ⊆ U, where θ(A) = - <sup>a</sup>∈<sup>A</sup> <sup>θ</sup>(a). E.g. a state satisfies νX.(ψ<sup>∧</sup> 1X) if it is the root of an infinite binary tree in which ψ is satisfied globally.

3. Similarly, the two-valued *probabilistic* μ*-calculus* [4,24] is obtained by using the distribution functor T = D that maps sets U to probability distributions over U with countable support, defined by

$$\mathcal{D}(U) = \{d: U \to (\mathbb{Q} \cap [0, 1]) \mid \sum\_{u \in U} d(u) = 1\}.$$

Then T-coalgebras are just Markov chains. We use the modal similarity type <sup>Λ</sup> <sup>=</sup> {p, [p] <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>Q</sup> <sup>∩</sup> [0, 1]} and define the predicate liftings

$$\mathbb{E}\left[\langle p\rangle\right]\_U(A) = \{d \in \mathcal{D}(U) \mid d(A) > p\} \quad \text{([[p]]}\_U(A) = \{d \in \mathcal{D}(U) \mid d(\overline{A}) \le p\},$$

for sets U and A ⊆ U, where again d(A) = - <sup>a</sup>∈<sup>A</sup> <sup>d</sup>(a).

4. We interpret the *graded* μ*-calculus with polynomial inequalities* over the semantic domain from item 2 (i.e. multigraphs). We put Λ = {Lp,b, Mp,b | <sup>p</sup> <sup>∈</sup> <sup>N</sup>><sup>0</sup>[X1,...,Xn], b, n <sup>∈</sup> <sup>N</sup>} (that is, <sup>p</sup> ranges over multivariate polynomials with positive integer coefficients) and define the predicate liftings

$$\begin{aligned} \{L\_{p,b}\}\_U(A\_1, \ldots, A\_n) &= \{\theta \in \mathcal{B}(U) \mid p(\theta(A\_1), \ldots, \theta(A\_n)) > b\}, \\ \{M\_{p,b}\}\_U(A\_1, \ldots, A\_n) &= \{\theta \in \mathcal{B}(U) \mid p(\theta(\overline{A\_1}), \ldots, \theta(\overline{A\_n})) \le b\}, \end{aligned}$$

for sets U and A1,...,A<sup>n</sup> ⊆ U, where θ(A) = - <sup>a</sup>∈<sup>A</sup> <sup>θ</sup>(a). This logic subsumes the *Presburger* μ*-calculus*, that is, the extension of the graded μ-calculus with (monotone) linear inequalities, which may be seen as the fixpoint variant of *Presburger modal logic* [7]. E.g. the formula μY.(r ∨ L<sup>2</sup>X1+X<sup>2</sup> <sup>2</sup> ,<sup>2</sup>(p ∧ Y,q ∧ Y )) says that the current state is the root of a finite tree all whose leaves satisfy r, and each of whose inner nodes has n<sup>1</sup> children satisfying p and n<sup>2</sup> children satisfying q where 2n<sup>1</sup> +n<sup>2</sup> <sup>2</sup> > 2. One sees an apparent coding of the logic into the graded μ-calculus, which however incurs exponential blowup.

5. Similarly, we use the semantic domain from item 3, Markov chains, to obtain the *probabilistic* μ*-calculus with polynomial inequalities* [23]: We put <sup>Λ</sup> <sup>=</sup> {Lp,b, Mp,b <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>Q</sup>><sup>0</sup>[X1,...,Xn], b <sup>∈</sup> <sup>Q</sup>≥0, n <sup>∈</sup> <sup>N</sup>} (i.e. <sup>p</sup> ranges over polynomials) and

$$\begin{aligned} \{L\_{p,b}\}\_U(A\_1, \ldots, A\_n) &= \{d \in \mathcal{D}(U) \mid p(d(A\_1), \ldots, d(A\_n)) > b\}, \\ \{M\_{p,b}\}\_U(A\_1, \ldots, A\_n) &= \{d \in \mathcal{D}(U) \mid p(d(\overline{A\_1}), \ldots, d(\overline{A\_n})) \le b\} \end{aligned}$$

for sets U and A1,...,A<sup>n</sup> ⊆ U. This logic presumably does not encode into the probabilistic μ-calculus as in 3 above, and can express constraints on independent products of events (see also [25]). E.g. the formula νY.L<sup>X</sup>1X2,0.<sup>9</sup>(p ∧ Y,q ∧ Y ) says roughly that two independently sampled successors of the current state will satisfy p and q, respectively, and then satisfy the same property again, with probability at least 0.9.

(The modalities in the last two items are inevitably less general than in the corresponding next-step logics [7,23], due to the need to ensure monotonicity.)

#### **3 Tracking Automata**

We use *parity automata* (e.g. [13]) that track single formulas along paths through potential models to decide whether it is possible to construct a model in which all least fixpoint formulas are eventually satisfied. Formally, (nondeterministic) parity automata are tuples A = (V, Σ, Δ, q0, α) where V is a set of *nodes*; Σ is a finite set, the *alphabet*; Δ ⊆ V × Σ × V is the *transition relation* assigning a set Δ(v, a) = {u | (v, a, u) ∈ Δ} of nodes to all v ∈ V and a ∈ Σ; q<sup>0</sup> ∈ V is the *initial node*; and <sup>α</sup> : <sup>Δ</sup> <sup>→</sup> <sup>N</sup> is the *priority function*, assigning priorities <sup>α</sup>(v, a, u) <sup>∈</sup> <sup>N</sup> to *transitions* (v, a, u) ∈ Δ (this is the standard in recent work since it yields slightly more succinct automata). If Δ is a (partial) functional relation, then A is said to be *deterministic*, and we denote the corresponding partial function by <sup>δ</sup> : <sup>V</sup> <sup>×</sup><sup>Σ</sup> <sup>→</sup> <sup>V</sup> . The automaton <sup>A</sup> *accepts* an infinite word <sup>w</sup> <sup>=</sup> <sup>w</sup>0, w1,... <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> if there is a w-path through A on which the highest priority that is passed infinitely often is even; formally, the language that is accepted by A is defined by L(A) = {<sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> | ∃<sup>ρ</sup> <sup>∈</sup> run(A, w). max(Inf(<sup>α</sup> ◦ <sup>ρ</sup>)) is even}, where run(A, w) denotes the set of infinite sequences (ρ0, w0, ρ1),(ρ1, w1, ρ2),... <sup>∈</sup> <sup>Δ</sup><sup>ω</sup> such that <sup>ρ</sup><sup>0</sup> <sup>=</sup> <sup>q</sup><sup>0</sup> and where, given an infinite sequence S, Inf(S) denotes the elements that occur infinitely often in <sup>S</sup>. Here, we see infinite sequences <sup>ρ</sup> <sup>∈</sup> <sup>U</sup> <sup>ω</sup> over some set <sup>U</sup> as functions <sup>N</sup> <sup>→</sup> <sup>U</sup> and write <sup>ρ</sup><sup>i</sup> to denote the <sup>i</sup>-th element of <sup>ρ</sup>.

We now *fix a target formula* χ and put n<sup>0</sup> = |χ|, n<sup>1</sup> = size(χ). We let **F** denote the *Fischer-Ladner closure* [20] of χ; i.e. **F** contains all formulas that can arise as subformulas when unfolding each fixpoint in χ exactly once. We put k = max{ad(ψ) | ψ ∈ **F**} and selections = P(**F** ∩ Λ(**F**)) (**F** ∩ Λ(**F**) is the set of modal literals in **<sup>F</sup>**). We have <sup>|</sup>**F**| ≤ <sup>n</sup> and hence <sup>|</sup>selections| ≤ <sup>2</sup>n.

**Definition 2 (Tracking automaton).** The *tracking automaton* for χ is a nondeterministic parity automaton A<sup>χ</sup> = (**F**, Σ, Δ, q0, α), where q<sup>0</sup> = χ,

$$\Sigma = \{ (\psi\_0 \lor \psi\_1, b) \in \mathbf{F} \times \{0, 1\} \} \cup \{ (\psi\_0 \land \psi\_1, 0) \in \mathbf{F} \times \{0\} \} \cup \emptyset$$

$$\{ (\eta X. \psi\_1, 0) \in \mathbf{F} \times \{0\} \} \cup \text{ selections },$$

and for ψ,ψ0, ψ<sup>1</sup> ∈ **F**, κ ∈ selections and b ∈ {0, 1},

$$\Delta(\psi,\kappa) = \{\psi\_0 \in \mathbf{F} \mid \psi \in \kappa \cap A(\{\psi\_0\})\}$$

$$\Delta(\psi,(\psi\_0 \vee \psi\_1, b)) = \{\psi\_b \mid \psi = \psi\_0 \vee \psi\_1\} \cup \{\psi \mid \psi \neq \psi\_0 \vee \psi\_1\}$$

$$\Delta(\psi,(\psi\_0 \wedge \psi\_1, 0)) = \{\psi\_0, \psi\_1 \mid \psi = \psi\_0 \wedge \psi\_1\} \cup \{\psi \mid \psi \neq \psi\_0 \wedge \psi\_1\}$$

$$\Delta(\psi,(\eta X.\psi\_1, 0)) = \{\psi\_1[X \mapsto \psi] \mid \psi = \eta X.\psi\_1\} \cup \{\psi \mid \psi \neq \eta X.\psi\_1\}$$

E.g. the last clause means that when tracking the unfolding of a fixpoint ηX. ψ<sup>1</sup> at ψ, we track ψ to the unfolding ψ1[X → ψ] if ψ equals the unfolded fixpoint, and to ψ otherwise; similarly for the other clauses, and in particular a modal literal ψ = ♥ψ<sup>0</sup> is only tracked to ψ<sup>0</sup> through a selection κ if ♥ψ<sup>0</sup> ∈ κ, i.e. if κ selects ♥ψ<sup>0</sup> to be tracked. The priority function α is derived from the alternation depths of formulas, counting only unfoldings of fixpoints (i.e. all other transitions have priority 1). Formally, α(ψ, σ, ψ ) = 1 if ψ = ψ or ψ is not a fixpoint literal; if ψ is a fixpoint literal and ψ = ψ , then we put α(ψ, σ, ψ ) = ad(ψ).

Intuitively, words from Σ<sup>ω</sup> encode infinite paths through coalgebras (C, ξ) in which states x ∈ C are labelled with sets l(x) of formulas, where letters κ ∈ selections encode modal steps from states x ∈ C with label l(x) to states y ∈ C with label {ψ | ♥ψ ∈ κ ∩ l(x)}. The automaton is built to accept L(Aχ) = BadBranch<sup>χ</sup> where BadBranch<sup>χ</sup> is the set of words that encode a path on which a least fixpoint formula ψ is unfolded infinitely often without being dominated by any outer fixpoint formula (i.e. one with alternation depth greater than ad(ψ)). Letters (ψ<sup>0</sup> ∨ ψ1, b) choose disjuncts according to b, while for letters (ψ<sup>0</sup> ∧ ψ1, 0), the tracking automaton is nondeterministic, reflecting the fact that bad fixpoints can reside in either ψ<sup>0</sup> or ψ1. The automaton A<sup>χ</sup> has size n<sup>0</sup> and priorities 1 to k. Using a standard construction (e.g. [18]), we transform A<sup>χ</sup> into an equivalent B¨uchi automaton of size n0k. Then we determinize the B¨uchi automaton using, e.g., the Safra/Piterman-construction [30,32] and obtain an equivalent deterministic parity automaton with priorities 0 to 2n0k − 1 and size <sup>O</sup>(((n0k)!)<sup>2</sup>). Finally we complement this parity automaton by increasing every priority by 1, obtaining a deterministic parity automaton B<sup>χ</sup> = (Dχ, Σ, δ, v0, β) of size <sup>O</sup>(((n0k)!)<sup>2</sup>), with priorities 1 to 2n0<sup>k</sup> and with

$$L(\mathsf{B}\_{\chi}) = \overline{L(\mathsf{A}\_{\chi})} = \overline{\mathsf{BadBrancch}\_{\chi}} =: \mathsf{GoodBrancch}\_{\chi},$$

i.e. B<sup>χ</sup> is a deterministic parity automaton that accepts the words that encode paths along which satisfaction of least fixpoints is never deferred indefinitely. We define a labelling function l : D<sup>χ</sup> → P(**F**) mapping each state of B<sup>χ</sup> (e.g. a Safra tree) to the set of formulas occurring in it.

**Remark 3.** It has been noted that the standard tracking automata for *alternation-free* formulas are, in fact, Co-B¨uchi automata [10,16] and that the tracking automata for *aconjunctive* formulas are *limit-deterministic* parity automata [15]. These considerably simpler automata can be determinized to deterministic B¨uchi automata of size 3<sup>n</sup><sup>0</sup> and to deterministic parity automata of size O((n0k)!) and with 2n0k priorities, respectively. This observation also holds true for the tracking automata in this work so that for formulas of suitable syntactic shape, Lemma 11 below yields accordingly lower bounds on the runtime of our satisfiability checking algorithm.

# **4 Global Caching for the Coalgebraic** *µ***-Calculus**

We now introduce a generic global caching algorithm for satisfiability in the coalgebraic μ-calculus. Given an input formula χ, the algorithm expands the determinized and complemented tracking automaton B<sup>χ</sup> step by step and propagates (un)satisfiability through this graph; the algorithm terminates as soon as the initial node v<sup>0</sup> is marked as (un)satisfiable. The algorithm bears similarity to standard game-based algorithms for μ-calculi [8,9,15]; however, it crucially deviates from these algorithms in the treatment of modal steps: Intuitively, our algorithm decides whether it is possible to remove some of the modal transitions as well as one of the transitions from each reachable pair ((ψ0∨ψ1), 0),((ψ0∨ψ1), 1) of disjunction transitions within the automaton B<sup>χ</sup> in such a way that the resulting sub-automaton of B<sup>χ</sup> is totally accepting, that is, accepts any word for which there is an infinite run. In doing so, it is crucial that the labels of state nodes v in the reduced automaton are *one-step satisfiable*, in a sense introduced next, in the set of states that are reachable from v by the remaining modal transitions. Propagating (un)satisfiability over modal transitions thus involves *one-step satisfiability checking*, a functor-specific problem that in many instances can be solved in time singly exponential in size(χ). In previous work [8], a variant of one-step satisfiability has been used in satisfiability games for coalgebraic μcalculi, which however leads to a doubly exponential number of modal moves for one of the players and hence does not yield a singly exponential upper bound on satisfiability checking (unless a suitable set of tableau rules is provided).

**Definition 4 (One-step satisfiability problem** [26,33,35]**).** Let V be a finite set, let v ⊆ Λ(V ) such that a = b whenever ♥1a, ♥2b ∈ v, and let U ⊆ P(V ). The *one-step satisfiability problem* for inputs v and U is to decide whether T U ∩ [[v]]<sup>1</sup> = ∅, where

$$[v]\_1 = \bigcap\_{\bigcirc\_{a \in v} \mathbb{Q}} [\bigcirc] \{ u \in U \mid a \in u \}.$$

We put size(v) = - ♥a∈<sup>v</sup> size(♥), and denote the time it takes to solve the problem on v,U with size(v) = <sup>a</sup> and <sup>|</sup><sup>V</sup> <sup>|</sup> <sup>=</sup> <sup>b</sup> (hence <sup>|</sup>U| ≤ <sup>2</sup><sup>b</sup>) by <sup>t</sup>(a, b).

**Remark 5.** We keep the definition of the actual one-step logic as mentioned in the introduction somewhat implicit in the above definition of the one-step satisfiability problem. One can see that it contains two layers: a purely propositional layer embodied in U, which postulates which propositional formulas over V are satisfiable; and a modal layer with nesting depth of modalities uniformly equal to 1, embodied in the set v, which specifies constraints on an element of T U.

**Example 6.** For the standard modal μ-calculus (Example 1.1), the one-step satisfiability problem is to decide for given v ⊆ Λ(V ) and U ⊆ P(V ) whether there is A ∈ P(U) ∩ [[v]]1, that is, a subset A ⊆ U such that for each ♦a ∈ v, there is u ∈ A such that a ∈ u, and for each a ∈ v and each u ∈ A, a ∈ u. Here we have <sup>t</sup>(a, b) <sup>≤</sup> <sup>a</sup> · <sup>2</sup><sup>b</sup> where <sup>a</sup> <sup>=</sup> size(v), <sup>b</sup> <sup>=</sup> <sup>|</sup><sup>V</sup> <sup>|</sup>. For the graded <sup>μ</sup>-calculus (Example 1.2), the one-step satisfiability problem is to decide for v,U as above whether there is a multiset θ ∈ B(U) such that - <sup>u</sup>∈U|a∈<sup>u</sup> <sup>θ</sup>(u) > m for each ma ∈ v and - <sup>u</sup>∈U|a /∈<sup>u</sup> <sup>θ</sup>(u) <sup>≤</sup> <sup>m</sup> for each [m]<sup>a</sup> <sup>∈</sup> <sup>v</sup>.

**Definition 7 (States and Prestates).** A node v of B<sup>χ</sup> is a *state* if its label contains only modal literals (l(v) ⊆ Λ(**F**)), and otherwise a *prestate*, in which case we fix ψ<sup>v</sup> ∈ l(v)\Λ(**F**). We write states, prestates ⊆ D<sup>χ</sup> for the sets of states and prestates, respectively.

We next define 2n0k-ary set functions f and g that compute one-step (un)satisfiability w.r.t. their argument sets.

**Definition 8 (One-step propagation).** For sets G ⊆ D<sup>χ</sup> and **X** = <sup>X</sup>1,...,X2n0<sup>k</sup> ∈ P(G)<sup>2</sup>n0<sup>k</sup>, we put

$$\begin{split} f(\mathbf{X}) = \{v \in \mathbf{predates} \mid \exists b \in \{0, 1\} . \delta(v, (\psi\_v, b)) \in X\_{\beta(v, (\psi\_v, b))}\} \cup \\ \{v \in \mathbf{states} \mid T(\bigcup\_{1 \le i \le 2n\_0k} X\_i(v)) \cap [l(v)] \space 1 \ne \emptyset\} \\ g(\mathbf{X}) = \{v \in \mathbf{predates} \mid \forall b \in \{0, 1\} . \delta(v, (\psi\_v, b)) \in X\_{\beta(v, (\psi\_v, b))}\} \cup \\ \{v \in \mathbf{states} \mid T(\bigcup\_{1 \le i \le 2n\_0k} \overline{X\_i}(v)) \cap [l(v)] \space 1 = \emptyset\}, \end{split}$$

where β(v,(ψv, b)) abbreviates β(v,(ψv, b), δ(v,(ψv, b))) and where

$$X\_i(v) = \{ l(u) \mid u \in X\_i, \exists \kappa \in \texttt{sales}. \delta(v, \kappa) = u, \beta(v, \kappa, u) = i \}. $$

Since for states v, l(v) ⊆ Λ(**F**) and Xi(v) ⊆ P(**F**) for all i, one-step propagation steps for states are instances of the one-step satisfiability problem with |V | = |**F**|, solvable in time t(n1, n0) because size(l(v)) ≤ n<sup>1</sup> and |**F**| ≤ n0.

**Definition 9 (Propagation).** Given a set G, we put

$$\begin{aligned} \mathbf{E}\_G &= \eta\_{2n\_0k} X\_{2n\_0k} \dots \eta\_2 X\_{2} \,\eta\_1 X\_1 f(\mathbf{X}), \\ \mathbf{A}\_G &= \overline{\eta\_{2n\_0k}} X\_{2n\_0k} \dots \overline{\eta\_2} X\_2 \,\overline{\eta\_1} X\_1 g(\mathbf{X}), \end{aligned}$$

where **X** = X1,...,X<sup>2</sup>n0<sup>k</sup> for X<sup>i</sup> ⊆ G, where η<sup>i</sup> = μ for odd i, η<sup>i</sup> = ν for even i and where ν = μ and μ = ν.

The set **E**<sup>G</sup> contains nodes v ∈ G for which there are choices for all disjunctions and modal transitions that are reachable from v within G (as indicated at the beginning of the section) such that the labels of all reachable states in the chosen sub-automaton of B<sup>χ</sup> are one-step satisfiable and such that on all paths through the chosen sub-automaton, the highest priority that is passed infinitely often is even, the intuition being that no least fixpoint is unfolded infinitely often without being dominated. Dually, the set **A**<sup>G</sup> contains nodes for which there exist no such suitable choices.

We recall that v<sup>0</sup> ∈ D<sup>χ</sup> is the initial state of the determinized and complemented tracking automaton Bχ. The algorithm expands B<sup>χ</sup> step-by-step starting from v0; for prestates u, the expansion step adds nodes according to the fixed non-modal formula ψ<sup>u</sup> that is to be expanded next (Definition 7), and for states, the expansion follows all (matching) selections. The order of expansion can be chosen freely, e.g. by heuristic methods. Optional intermediate propagation steps can be used judiciously to realize on-the-fly solving.

**Algorithm 10 (Global caching).** To decide the satisfiability of the input formula χ, initialize the sets of *unexpanded* and *expanded* nodes, U = {v0} and G = ∅, respectively.


**Lemma 11.** *Algorithm <sup>10</sup> runs in time* <sup>O</sup>(((n0k)!)<sup>4</sup>n0<sup>k</sup> · <sup>t</sup>(n1, n0))*.*

*Proof.* The loop of the algorithm expands the determinized and complemented tracking automaton node by node and hence is executed at most |Dχ| ∈ <sup>O</sup>(((n0k)!)<sup>2</sup>) <sup>⊆</sup> <sup>2</sup>O(n0<sup>k</sup> log <sup>n</sup>0) times. A single expansion step can be implemented in time <sup>O</sup>(2<sup>n</sup><sup>0</sup> ) since propositional expansion is unproblematic and for the modal expansion of a state u, all (matching) selections, of which there are (at most) 2<sup>n</sup><sup>0</sup> , have to be considered. A single propagation step consists in computing two fixpoints of nesting depth 2n0k of the functions f and g over <sup>P</sup>(Dχ)<sup>2</sup>n0<sup>k</sup> and can hence be implemented in time 2(|Dχ<sup>|</sup> <sup>2</sup>n0<sup>k</sup> · <sup>t</sup>(n1, n0)) <sup>∈</sup> <sup>O</sup>(((n0k!)<sup>2</sup>)<sup>2</sup>n0<sup>k</sup>·t(n1, n0)) <sup>⊆</sup> <sup>2</sup>O(n<sup>2</sup> <sup>0</sup>k<sup>2</sup> log <sup>n</sup>0+log(t(n1,n0))), noting that a single computation of <sup>f</sup>(**X**) and <sup>g</sup>(**X**) for a tuple **<sup>X</sup>** ∈ P(Dχ)<sup>2</sup>n0<sup>k</sup> can be implemented in time O(t(n1, n0)) – this has been noted above for states, and prestates are unproblematic. Thus the complexity of the whole algorithm is dominated by the complexity of the propagation step.

**Corollary 12.** *If the one-step satisfiability problem of a coalgebraic logic can be solved in time* t(a, b) *exponential in* a + b *on inputs* v ⊆ Λ(V )*,* U ⊆ P(V ) *with* size(v) = a*,* |V | = b*, then the satisfiability problem of the corresponding coalgebraic* μ*-calculus is in* ExpTime*.*

Since the existence of a tractable set of tableau rules implies the required time bound on one-step satisfiability, the above result subsumes earlier bounds obtained by tableau-based approaches in [4,15,16]; however, it covers additional example logics for which no suitable tableau rules are known. In particular we have

**Proposition 13.** *The satisfiability problems of the following logics are in* ExpTime*:*


(Tractable sets of tableau rules have previously been claimed for the graded [36] and Presburger [22] μ-calculus but have since been discovered to be flawed [23].)

*Proof.* It suffices to show that the respective one-step satisfiability problems can be solved on inputs v ⊆ Λ(V ), U ⊆ P(V ) with size(v) = a and |V | = b in singly exponential time in <sup>a</sup> <sup>+</sup> <sup>b</sup>, i.e. in time <sup>t</sup>(a, b) <sup>∈</sup> <sup>2</sup><sup>p</sup>(a+b) for <sup>p</sup> at most polynomial. E.g. for standard relational modalities, we have <sup>t</sup>(a, b) = <sup>a</sup> · <sup>2</sup><sup>b</sup> <sup>=</sup> 2<sup>b</sup>+log <sup>a</sup>, see Example 6. While the bounds can be established by relatively easy arguments (e.g. using known bounds on sizes of solutions of systems of real or integer linear inequalities) for all of our remaining example logics, we import them from previous work for brevity. For the one-step satisfiability problem of graded modal logic, by [21, Lemma 1], we have <sup>t</sup>(a, b) <sup>≤</sup> (2 · <sup>2</sup><sup>a</sup> + 2)<sup>b</sup> <sup>≤</sup> <sup>2</sup>ab+2<sup>b</sup>; the Lemma uses counters to check joint one-step satisfiability of constraints and directly extends to the one-step satisfiability problem of graded modal logic with monotone polynomial inequalities, in which case we require n counters for each n-ary polynomial. The bound for (two-valued) probabilistic modal logic (with polynomial inequalities) is shown in [23, Example 7].

**Remark 14.** We also obtain a polynomial bound on branching width in models for all our example logics simply by importing Lemma 6 and the observations in Example 7 from [23]. With the exception of the standard μ-calculus, this bound appears to be new in all our example logics. Of course, for graded and Presburger μ-calculi, polynomial branching holds only in their coalgebraic semantics, i.e. over multigraph models but not over Kripke models.

#### **5 Soundness and Completeness**

We now prove the central result, that is, the soundness and completeness of Algorithm 10. As the sets **E**<sup>G</sup> and **A**<sup>G</sup> grow monotonically with G, it suffices to prove equivalence of satisfiability and containment of the initial node v<sup>0</sup> in **E** := **E**D<sup>χ</sup> . Our program is as follows: We show that v<sup>0</sup> ∈ **E** if and only if there is a *pre-semi-tableau* (Definition 15) for χ with *unfolding timeouts* (Definition 17), which in turn is the case if and only if χ is satisfiable. We establish the latter equivalence by constructing a model for χ from a given pre-semi-tableau with unfolding timeouts and, for the converse direction, extracting a pre-semi-tableau with unfolding timeouts from the model.

**Definition 15 (Pre-semi-tableau).** Given a ternary relation R ⊆ A × B × A and a ∈ A, b ∈ B, we generally write R(a) = {a ∈ A | ∃b ∈ B.(a, b, a ) ∈ R} and R(a, b) = {a ∈ A | (a, b, a ) ∈ R}. Let W ⊆ D<sup>χ</sup> and put U = W ∩ prestates and V = W ∩states. Given a ternary relation L ⊆ W ×Σ×W, the pair (W, L) is a *presemi-tableau* for χ if the following conditions hold: L ⊆ δ; T(L(v)) ∩ [[l(v)]]<sup>1</sup> = ∅ for all v ∈ V ; for each u ∈ U, there is exactly one b ∈ {0, 1} such that L(u,(ψu, b)) = {δ(u,(ψu, b))}, and for all other σ ∈ Σ, L(u, σ) = ∅; and there is no L-cycle that contains only elements from U. A *path* through a pre-semitableau is an infinite sequence (v0, σ0),(v1, σ1),... <sup>∈</sup> (<sup>W</sup> <sup>×</sup> <sup>Σ</sup>)<sup>ω</sup> such that for all i, vi+1 ∈ L(vi, σi). We denote *the* first state that is reachable by zero or more L-steps from a node v ∈ W by v (since there is no L-cycle within U, such a state always exists).

Given a state v, the relation L of a pre-semi-tableau thus picks a set L(v) of nodes in which l(v) is one-step satisfiable; given a prestate u, L picks a single (pre)state that is obtained from u by transforming the formula ψu.

**Definition 16 (Tracking timeouts).** Given a path ρ = (v0, σ0),(v1, σ1),... through a pre-semi-tableau, we say that priority i *occurs* (at position j) in ρ if β(v<sup>j</sup> , σ<sup>j</sup> , vj+1) = i, recalling that β is the priority function of the determinised and complemented tracking automaton Bχ. Then the path ρ has *tracking timeouts* m = (m2n0<sup>k</sup>,...,m1) if for each odd 1 ≤ i < 2n0k, priority i occurs at most m<sup>i</sup> times in ρ before some priority greater than i occurs in ρ. Nothing is said about the m<sup>i</sup> for even i, which are in fact irrelevant and serve only to ease notation. A node w ∈ W in a pre-semi-tableau (W, L) has *tracking timeouts* m if every path through (W, L) starting at w has tracking timeouts m. A pre-semitableau (W, L) *has tracking timeouts* if each w ∈ W has tracking timeouts m for some m.

Intuitively, a pre-semi-tableau (W, L) has tracking timeouts if every word that encodes an infinite L-path through W is accepted by Bχ. The next definition is geared towards characterizing non-acceptance by Aχ:

**Definition 17 (Traces and unfolding timeouts).** Let (W, L) be a graph with L ⊆ W × Σ × W and labeling function l : W → P(**F**). Given an L-path ρ = (v0, σ0),(v1, σ1),... (with (vi, σi, vi+1) ∈ L for i ≥ 0) and a sequence of formulas Ψ = ψ0, ψ1,..., we say that Ψ is a *trace* of ψ<sup>0</sup> along ρ (we also say that ρ *contains* the trace Ψ) if ψ<sup>i</sup> ∈ l(vi) and ψi+1 ∈ Δ(ψi, σi) for all i. For i with ψ<sup>i</sup> = ηX.ψ for some fixpoint variable X and some formula ψ, we say that Ψ *unfolds at level* ad(ψi) at position i. Then the trace Ψ has *unfolding* *timeout* <sup>m</sup> <sup>∈</sup> <sup>N</sup> for <sup>ψ</sup><sup>0</sup> at level <sup>j</sup> if <sup>Ψ</sup> unfolds at most <sup>m</sup> times at level <sup>j</sup> before Ψ unfolds at some level greater than j. The path ρ has *unfolding timeouts* for ψ<sup>0</sup> at level j if there is, for all its traces Ψ of ψ0, some m such that Ψ has unfolding timeout m for ψ<sup>0</sup> at level j. A node w ∈ W has *unfolding timeouts* at level j for some formula ψ if every path through (W, L) that starts at w and that contains infinitely many steps (vi, σi) such that σ<sup>i</sup> ∈ selections has unfolding timeouts for ψ at level i. (Since fixpoint variables are by assumption guarded by modal operators, it suffices to require timeouts just for such paths that contain infinitely many modal steps.) A node w ∈ W has *unfolding timeouts* m = (mk,...,m1) for some formula ψ if every path through (W, L) that starts at w and that contains infinitely many steps (vi, σi) such that σ<sup>i</sup> ∈ selections has, for each odd 1 ≤ i ≤ k, unfolding timeouts m for ψ at level i; again the unfolding timeouts for even i, that is, for greatest fixpoints, are irrelevant. The graph (W, L) has *unfolding timeouts* if for each element w ∈ W and each formula ψ ∈ l(v), there is some vector m such that w has unfolding timeouts m for ψ. We denote the set of nodes that have unfolding timeouts m for ψ by uto(ψ, m) ⊆ W.

A graph (W, L) has unfolding timeouts if for all words that encode an infinite L-path through (W, L), all runs of the nondeterministic tracking automaton A<sup>χ</sup> on the word are *non*-accepting. We recall that a run of A<sup>χ</sup> is accepting if it unfolds some least fixpoint infinitely often without having it dominated.

**Lemma 18.** *Let* (W, L) *be a pre-semi-tableau. Then* (W, L) *has tracking timeouts if and only if it has unfolding timeouts.*

*Proof.* We recall that B<sup>χ</sup> is obtained from A<sup>χ</sup> by determinization and subsequent complementation so that we have L(Bχ) = L(Aχ). The result thus follows directly from the fact that having tracking timeouts means that B<sup>χ</sup> accepts all words that encode a path in (W, L) while having unfolding timeouts means that A<sup>χ</sup> does not accept any word that encodes a path in (W, L).

**Lemma 19.** *We have* v<sup>0</sup> ∈ **E** *if and only if there is a pre-semi-tableau for* χ *that has tracking timeouts.*

Combining Lemmas 19 and 18, we obtain

**Corollary 20.** *We have* v<sup>0</sup> ∈ **E** *if and only if there is a pre-semi-tableau for* χ *that has unfolding timeouts.*

We now show that satisfiability of χ and the existence of a semi-pre-tableau for χ with unfolding timeouts coincide.

**Definition 21.** Given a pre-semi-tableau (W, L) with set of states V , we put

$$\widehat{\{\psi\}} = \{v \in V \mid l(v) \vdash\_{\mathsf{PL}} \psi\} \qquad \widehat{\{\psi\}}\_{\overline{m}} = \widehat{\{\psi\}} \cap \{\lceil u \rvert \in V \mid u \in \mathsf{uto}(\psi, \overline{m})\}$$

where ψ ∈ **F**, where PL denotes propositional entailment and where m is a vector of k natural numbers.

Thus we have v ∈ [[ψ]]<sup>m</sup> if there is a node u ∈ W such that u = v and u has timeouts m for ψ. This serves to ease the proofs of the upcoming existence and truth lemmas as it anchors the timeout vector m at the node u instead of anchoring it at the state v which may not have timeouts m for ψ (namely, if a greatest fixpoint is unfolded on the L-path from u to v).

**Definition 22 (Strong coherence).** Let (W, L) be a pre-semi-tableau with set V of states. A coalgebra C = (V, ξ) is *strongly coherent* if for all states v ∈ V , for all formulas ♥ψ ∈ **F** and for all timeout-vectors m,

$$v \in \widehat{\left[\bigtriangledown\right]}\_{\overline{m}} \text{ implies } \xi(v) \in \left[\bigtriangledown\right](\widehat{\left[\psi\right]}\_{\overline{m}}) .$$

Strongly coherent coalgebras exist over pre-semi-tableaux:

**Lemma 23 (Existence).** *Let* (W, L) *be a pre-semi-tableau with set of states* V *. Then there is a strongly coherent coalgebra over* V *.*

Since all least fixpoint literals are satisfied after finitely many unfolding steps in strongly coherent coalgebras with unfolding timeouts, they are models, i.e. satisfy all the formulas in their labels:

**Lemma 24 (Truth).** *In strongly coherent coalgebras that have unfolding timeouts, we have that for all* ψ ∈ **F***,*

$$
\widehat{[\psi]} \subseteq [\psi].
$$

**Definition 25 (Timed-out satisfaction).** Given sets U ⊆ W, a function f : <sup>P</sup>(W) → P(W) and an ordinal number <sup>λ</sup>, we define <sup>f</sup><sup>λ</sup>(U) = <sup>U</sup> if <sup>λ</sup> = 0, f<sup>λ</sup>(U) = f(f<sup>λ</sup>- (U)) if λ = λ + 1 and f<sup>λ</sup>(U) = k<λ f <sup>k</sup>(U) if λ is a limitordinal. The target formula χ is clean so that it contains, for each fixpoint variable X ∈ **V**, at most a single fixpoint literal ηX.ψ<sup>0</sup> as a subformula; we denote this formula by θ(X). Given a coalgebra (C, ξ), a formula ψ and a vector <sup>λ</sup> = (λk,...,λ<sup>j</sup> ) of ordinal numbers, we define [[ψ]]<sup>λ</sup> = [[ψ]]<sup>i</sup> where <sup>i</sup> : **<sup>V</sup>** → P  (C) is defined, for fixpoint variables X<sup>j</sup> that occur freely in ψ and for which we have θ(X<sup>j</sup> ) = ηXjψ<sup>j</sup> , by i(X<sup>j</sup> ) = ([[ψ<sup>j</sup> ]]<sup>X</sup><sup>j</sup> i- )<sup>λ</sup><sup>j</sup> (∅) if <sup>η</sup> <sup>=</sup> <sup>μ</sup> and by <sup>i</sup>(X<sup>j</sup> ) = [[νX<sup>j</sup> .ψ<sup>j</sup> ]]<sup>i</sup>- if η = ν, where i (X<sup>j</sup>- ) is undefined for j ≥ j and where i (X<sup>j</sup>- ) = i(X<sup>j</sup>- ) for j < j. Again the timeouts for greatest fixpoint variables are irrelevant and serve only to ease notation.

**Definition 26 (Strongly supporting Kripke frame).** Let (C, ξ) be a coalgebra. For states x ∈ C and formulas ψ such that x ∈ [[ψ]], let λ<sup>ψ</sup> denote the least vector of ordinal numbers such that <sup>x</sup> <sup>∈</sup> [[ψ]]<sup>λ</sup><sup>ψ</sup> . Also let, for <sup>ψ</sup> <sup>∈</sup> **<sup>F</sup>**, <sup>ψ</sup> be *the* subformula of χ such that ψ is obtained from ψ by repeatedly replacing free variables X by θ(X). A graph (C, L) with L ⊆ C×Σ×C and with labeling function l : C → P(**F**) such that l(x) = {ψ ∈ **F** | x ∈ [[ψ]]} is a *strongly supporting Kripke frame* (for C, ξ) if


**Lemma 27.** *Every coalgebra has a strongly supporting Kripke frame.*

**Definition 28.** Given a coalgebra (C, ξ) with strongly supporting Kripke frame (C, L), a formula <sup>ψ</sup> and a valuation <sup>i</sup> : **<sup>V</sup>** → P  (C), we define [[ψ]]<sup>L</sup> <sup>i</sup> by the same clauses as [[ψ]]<sup>i</sup> in all cases except the following:

$$\begin{aligned} \left[\psi\_0 \lor \psi\_1\right]\_i^L = \{x \in C \mid x \in \{\psi\_b\}\_i^L, b \in \{0, 1\}, L(x, (\phi\_0 \lor \phi\_1, b)) = \{x\}\} \\ \left[\bigcirc \psi\_0\right]\_i^L = \{x \in C \mid (Tg\_x)(\xi(x)) \in \left[\bigcirc](g\_x[[\psi\_0]\_i^L])\} \\ \left[\mu X.\psi\_0\right]\_i^L = \{x \in C \mid x \text{ has uniformly times at level } \text{ad}(\mu X.\phi\_0) \end{aligned} \text{ for } \mu X.\phi\_0 \text{ in } (C, L)\},$$

where μX.ψ<sup>0</sup> = μX.φ<sup>0</sup> and ψ<sup>0</sup> ∨ ψ<sup>1</sup> = φ<sup>0</sup> ∨ φ1, and where g<sup>x</sup> : C → {y<sup>κ</sup> | L(x, κ) = {yκ}} is defined by gx(c) = y<sup>κ</sup> if and only if κ = {♥ψ ∈ **F** | c ∈ [[ψ]]}.

Strongly supporting Kripke frames have unfolding timeouts:

**Lemma 29.** *For all coalgebras* (C, ξ) *with strongly supporting Kripke frame* (C, L)*, all formulas* <sup>ψ</sup> *and all valuations* <sup>i</sup> : **<sup>V</sup>** → P  (C)*, we have* [[ψ]]<sup>i</sup> <sup>⊆</sup> [[ψ]]<sup>L</sup> i .

**Lemma 30 (Soundness).** *Let* χ *be satisfiable. Then a pre-semi-tableau for* χ *with unfolding timeouts can be constructed over a subset of* Dχ*.*

*Proof (Sketch).* By Lemmas 27 and 29, any model of χ has a strongly supporting Kripke frame (C, L) with unfolding timeouts. We derive a pre-semi-tableau for χ from (C, L), inheriting unfolding timeouts.

**Corollary 31 (Soundness and completeness).** *We have*

v<sup>0</sup> ∈ **E** *if and only if* χ *is satisfiable*.

Our model construction moreover yields the same bound on minimum model size as in earlier work on the coalgebraic μ-calculus [4]:

**Corollary 32 (Small model property).** *Let* χ *be a satisfiable coalgebraic* μ*calculus formula. Then* <sup>χ</sup> *has a model of size* <sup>O</sup>(((nk)!)<sup>2</sup>) <sup>∈</sup> <sup>2</sup>O(nk log <sup>n</sup>) *.*

# **6 Conclusion**

We have shown that the satisfiability problem of the coalgebraic μ-calculus is in ExpTime, subject to establishing a suitable time bound on the much simpler one-step satisfiability problem. Prominent examples include the graded μcalculus, the (two-valued) probabilistic μ-calculus, and extensions of the probabilistic and the graded μ-calculus, respectively, with (monotone) polynomial inequalities; the ExpTime bound appears to be new for the last two logics. We have also presented a generic satisfiability algorithm that realizes the time bound and supports global caching and on-the-fly solving. Moreover, we have obtained a polynomial bound on minimum branching width in models for all example logics mentioned above.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Constructing Inductive-Inductive Types in Cubical Type Theory**

Jasper Hugunin(B)

University of Washington, Seattle, WA, USA jasper@hugunin.net

**Abstract.** Inductive-inductive types are a joint generalization of mutual inductive types and indexed inductive types. In extensional type theory, inductive-inductive types can be constructed from inductive types, and this construction has been conjectured to work in intensional type theory as well. In this paper, we show that the existing construction requires Uniqueness of Identity Proofs, and present a new construction (which we conjecture generalizes) of one particular inductive-inductive type in cubical type theory, which is compatible with homotopy type theory.

## **1 Introduction**

Inductive-inductive types allow for the mutual inductive definition of a type and a family over that type. As an example, we can simultaneously define contexts and types defined in a context, with dependently typed context extension:


Such definitions have been used for example by Danielsson [9] and Chapman [5] to define intrinsically typed syntax of a dependent type theory, and Agda supports such definitions natively.

These types have been studied extensively in Nordvall Forsberg [15]. There, in §5.3, inductive-inductive types with simple elimination rules (defined in op. cit. §3.2.5) are constructed from indexed inductive types in extensional type theory, and in §5.4 this is conjectured to work in intensional type theory as well.

In this paper, we first show that this construction does not work in intensional type theory without assuming Uniqueness of Identity Proofs (UIP), which is incompatible with the Univalence axiom of Homotopy Type Theory [18]. We then give an alternate construction in cubical type theory [6], which is compatible with Univalence. Specifically, this paper makes the following contributions:<sup>1</sup>

<sup>1</sup> The formalization can be found at https://github.com/jashug/ConstructingII.

c The Author(s) 2019 M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 295–312, 2019. https://doi.org/10.1007/978-3-030-17127-8\_17


#### **1.1 Syntax and Conventions**

We mostly mimic Agda syntax. The double bar symbol = is used for definitions directly and by pattern matching, and for equality of terms up to conversion. We write (a : A) <sup>→</sup> B for the dependent product type, and A <sup>→</sup> B for the non-dependent version. Functions are given by pattern matching f x <sup>=</sup> y or by lambda expressions f <sup>=</sup> λx.y. Similarly (a : A) <sup>×</sup> B is the dependent pair type, and A <sup>×</sup> B the non-dependent version. Pairs are (a, b), and projections are p.<sup>1</sup> and p.2. The unit type is , with unique inhabitant . Identity types are <sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>y</sup> for the type of identifications of x with y in type X, and we write refl for a proof of reflexivity. We do not assume that axiom K holds for identity types. We write Type for a universe of types (where Agda uses Set). In Sect. 3 we work in cubical type theory, which will be explained there.

#### **1.2 Running Example of an Inductive-Inductive Definition**

For the purposes of this paper, we will focus on one relatively simple inductiveinductive definition (with only 5 clauses), parametrized by a type X, which is given in Fig. 1. We will use this definition to prove that Nordvall Forsberg's construction implies UIP in Sect. 2 and as a running example to demonstrate our construction in cubical type theory in Sect. 3.

Our example starts with the simplest inductive-inductive sorts, taking A : Type and B : A <sup>→</sup> Type, and then populates A and B with simple constructors which suffice for our proof of UIP. We have inj, which is supposed to give exactly one element of each B a, while ext lets us mix Bs back into the As (mirroring the type of context extension), and η gives us something to start with: one element of A for each element of X (following the use of η in [15, Example 3.3]). The proof of UIP in Sect. <sup>2</sup> proceeds by considering the type B (ext (η x) (inj (η x)) for some x : X, and noticing that, while the simple elimination rules tell us that there should only be one element of this type (given by inj), in Nordvall Forsberg's construction there are actually as many as there are proofs of <sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>x</sup>.

Our goal in this paper is to construct (A, B, η, ext, inj) of the types given in Fig. 1 such that the simple elimination rules hold without using UIP. But first, we will show why Nordvall Forsberg's approach is not sufficient.

# **2 Deriving UIP**

Uniqueness of Identity proofs (UIP) for a type X is the principle that, for all <sup>x</sup> : <sup>X</sup>, <sup>y</sup> : <sup>X</sup>, <sup>p</sup> : <sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>y</sup>, <sup>q</sup> : <sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>y</sup>, the type <sup>p</sup> <sup>≡</sup><sup>x</sup>≡*X*<sup>y</sup> <sup>q</sup> is inhabited.

#### **Fig. 1.** Running example

Equivalently, for all <sup>x</sup> : <sup>X</sup>, <sup>p</sup> : <sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>x</sup>, the type <sup>p</sup> <sup>≡</sup><sup>x</sup>≡*X*<sup>x</sup> refl is inhabited. It expresses that there is at most one proof of any equality. UIP is independent of standard intensional type theory [13], and is inconsistent with Homotopy Type Theory [18].

Nordvall Forsberg's construction of inductive-inductive types is described in [15, §5.3]. In this section, we show that if the simple elimination rules hold for this construction of the inductive-inductive type in Fig. 1, then UIP holds for the type X (Theorem 1). This argument has been formalized in both Coq version 8.8.0 [8] (see UIP from Forsberg II.v) and Agda using the --without-K flag (see UIP from Forsberg II.agda).

To recap, Nordvall Forsberg [15, §5.3] constructs an inductive-inductive type by first defining an approximation (the *pre-syntax* ) which drops the A index from <sup>B</sup> leaving a mutual inductive definition. Concretely, we have <sup>A</sup>pre and <sup>B</sup>pre defined as in Fig. 2. Then a mutual indexed inductive definition is used to define the index relationship between <sup>A</sup>pre and <sup>B</sup>pre; these are the goodness predicates <sup>A</sup>good and <sup>B</sup>good. Finally, the inductive object (A, B, η, ext, inj) is defined by pairing the pre-syntax with goodness proofs (see Fig. 3).

In extensional type theory, Nordvall Forsberg proved that <sup>A</sup>good <sup>a</sup> is a mere proposition (all inhabitants are equal) [15, Lemma 5.37(ii)]. In intensional type theory as well, if function extensionality and UIP hold then <sup>A</sup>good is a mere

#### **Fig. 2.** Pre-syntax for the running example

proposition. This uniqueness of goodness proofs justifies having the definition of <sup>B</sup> ignore the goodness proof <sup>a</sup>good, since <sup>a</sup>good can have at most one value.

In the next two subsections, we prove that:


Combining these results, we conclude that Nordvall Forsberg's construction satisfies the simple elimination rules in intensional type theory only if UIP holds (Theorem 1).

#### **2.1 Unique Goodness Implies UIP**

We define notation (x == y) to mean the term

extpre (ηpre <sup>x</sup>) (injpre (ηpre <sup>y</sup>)) : <sup>A</sup>pre.

We first prove that there are at least as many proofs of <sup>A</sup>good (<sup>x</sup> == <sup>y</sup>) as there are of <sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>y</sup>.

**Lemma 1 (**<sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>y</sup> **is a retract of** *<sup>A</sup>***good).** *For all* <sup>x</sup> : <sup>X</sup> *and* <sup>y</sup> : <sup>X</sup>*, there are functions*

$$f: x \equiv\_X y \to A\_{good} \left( x == y \right), \qquad g: A\_{good} \left( x == y \right) \to x \equiv\_X y,$$

*such that for all* <sup>e</sup> : <sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>y</sup>*,* <sup>g</sup> (f e) <sup>≡</sup> <sup>e</sup>*.*

*Proof.* To define f, we let f refl <sup>=</sup>

$$\mathtt{ext}\_{\mathtt{good}}\left(\eta\_{\mathtt{pre}}\,\boldsymbol{x}\right)\left(\eta\_{\mathtt{good}}\,\boldsymbol{x}\right)\left(\mathtt{inj}\_{\mathtt{pre}}\left(\eta\_{\mathtt{pre}}\,\boldsymbol{x}\right)\right)\left(\mathtt{inj}\_{\mathtt{good}}\left(\eta\_{\mathtt{pre}}\,\boldsymbol{x}\right)\left(\eta\_{\mathtt{good}}\,\boldsymbol{x}\right)\right).$$

To define <sup>g</sup>, pattern matching on <sup>a</sup>good has only one possibility: <sup>a</sup>good <sup>=</sup>

extgood (ηpre <sup>x</sup>) (ηgood <sup>x</sup>) (injpre (ηpre <sup>x</sup>)) (injgood (ηpre <sup>x</sup>) (ηgood <sup>x</sup>)),

forcing <sup>y</sup> to be <sup>x</sup>, and in this case <sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>y</sup> holds by reflexivity. Then when e <sup>=</sup> refl, f e returns a proof in the format matched by g, so g (f refl) <sup>≡</sup> refl, and thus g (f e) <sup>≡</sup> e.

**Lemma 2 (Unique goodness implies UIP).** *If* <sup>A</sup>*good* <sup>t</sup> *is a mere proposition for all* t : A*pre, then UIP holds for the type* X*.*

*Proof.* Assume goodness proofs are unique, and take x : X, y : X, with p : x <sup>≡</sup> y, q : x <sup>≡</sup> y. We want to show that p <sup>≡</sup> q. Using the f and g from Lemma 1,


#### **2.2 Simple Elimination Rules Imply Unique Goodness**

Now we prove that there are at least as many proofs of B (tpre, tgood) as there are of <sup>A</sup>good <sup>t</sup>pre.

**Lemma 3 (***A***good is a retract of** <sup>B</sup>**).** *For all* <sup>t</sup>*pre* : <sup>A</sup>*pre and* <sup>t</sup>*good* : <sup>A</sup>*good* <sup>t</sup>*pre, there are functions*

$$f: A\_{good} \, t\_{pre} \to B\left(t\_{pre}, t\_{good}\right), \qquad g: B\left(t\_{pre}, t\_{good}\right) \to A\_{good} \, t\_{pre}$$

*such that for all* <sup>a</sup>*good* : <sup>A</sup>*good* <sup>t</sup>*pre,* <sup>g</sup> (f a*good*) <sup>≡</sup> <sup>a</sup>*good.*

*Proof.* We define f agood <sup>=</sup> injpre <sup>t</sup>pre, injgood <sup>t</sup>pre <sup>a</sup>good. By induction on Bgood, we define a function

$$g' : (a\_{\rm pre} : A\_{\rm pre}) \to (b\_{\rm pre} : B\_{\rm pre}) \to B\_{\rm good} \
a\_{\rm pre} \, b\_{\rm pre} \to A\_{\rm good} \, a\_{\rm pre}$$

taking

$$g'\ a\_{\text{pre}}\left(\mathbf{inj}\_{\text{pre}}\ a\_{\text{pre}}\right)\left(\mathbf{inj}\_{\text{good}}\ a\_{\text{pre}}\ a\_{\text{good}}\right) = a\_{\text{good}}.$$

Then we can define <sup>g</sup> (bpre, bgood) = <sup>g</sup> <sup>t</sup>pre <sup>b</sup>pre <sup>b</sup>good. Then <sup>g</sup> (f agood) <sup>≡</sup> <sup>a</sup>good holds by reflexivity.

**Lemma 4 (**B a **is contractible).** *Assuming the simple elimination rules from Fig. <sup>1</sup> hold for the* (A, B, η, *inj*, *ext*) *constructed above, for all* a : A *and* b : B a*, inj* <sup>a</sup> <sup>≡</sup>B a <sup>b</sup>*.*

*Proof.* Referring to the simple elimination rules given in Fig. 1, we pattern match on B by giving motives (*PA*,*PB*) and methods (Pη, Pext, Pinj), and then using the resulting *EB*.

We set *PA* <sup>a</sup> <sup>=</sup> , and take *PB* a b <sup>=</sup> inj <sup>a</sup> <sup>≡</sup>B a <sup>b</sup>. Then we have Pη x <sup>=</sup> , and <sup>P</sup>ext a bH <sup>=</sup> , and we take <sup>P</sup>inj a <sup>=</sup> refl : inj <sup>a</sup> <sup>≡</sup>B a inj <sup>a</sup>. The conclusion follows by *EB* : (<sup>a</sup> : <sup>A</sup>) <sup>→</sup> (<sup>b</sup> : B a) <sup>→</sup> inj <sup>a</sup> <sup>≡</sup>B a <sup>b</sup>.

**Lemma 5 (Simple elimination rules imply unique goodness).** *If the simple eliminators hold for the* (A, B, η, *inj*, *ext*) *constructed above, then for all* <sup>t</sup> : <sup>A</sup>*pre,* <sup>A</sup>*good* <sup>t</sup> *is a mere proposition.*

*Proof.* Assume that the simple elimination rules hold, and take <sup>t</sup> : <sup>A</sup>pre, and <sup>a</sup><sup>1</sup> and <sup>a</sup><sup>2</sup> in <sup>A</sup>good <sup>t</sup>. We use the definition of <sup>f</sup> and <sup>g</sup> from Lemma <sup>3</sup> with <sup>t</sup>pre <sup>=</sup> <sup>t</sup> and <sup>t</sup>good <sup>=</sup> <sup>a</sup><sup>1</sup>.

By Lemma 4, we know that

$$\mathsf{inj}\ (t, a\_1) \equiv\_{B\ (t, a\_1)} f\ a\_2.$$

Applying <sup>g</sup> to both sides, and recognizing that <sup>g</sup> (inj (t, a<sup>1</sup>)) computes to a<sup>1</sup>, while <sup>g</sup> (f a<sup>2</sup>) computes to <sup>a</sup><sup>2</sup> we find that

$$a\_1 = g\left(\text{inj } (t, a\_1)\right) \equiv\_{A\_{\text{good } t}} g\left(f|a\_2\right) = a\_2.$$

#### **2.3 Simple Elimination Rules for Nordvall Forsberg's Construction only if UIP**

**Theorem 1.** *If the simple elimination rules hold for Nordvall Forsberg's construction, then UIP holds for the type* X*.*

*Proof.* Compose the results of Lemmas 2 and 5.

Therefore Nordvall Forsberg's approach to constructing inductive-inductive types requires UIP. Since UIP is inconsistent with the Univalence axiom at the center of Homotopy Type Theory (HoTT) [18], we have an incentive to come up with a different construction which is consistent with HoTT.

### **3 Constructing an Inductive-Inductive Type in Cubical Type Theory**

Cubical type theory [6] is a recently developed type theory which gives a constructive interpretation of the Univalence axiom of Homotopy Type Theory. It has an implementation as a mode for Agda [19], which we use to formalize the construction given in this section of the running example from Fig. 1.

The most important difference between cubical type theory and standard intensional type theory as implemented by Coq or vanilla Agda is that the identity type <sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>y</sup> is represented (loosely speaking) by the type of functions <sup>p</sup> from an interval type <sup>I</sup> with two endpoints <sup>i</sup><sup>0</sup> and <sup>i</sup><sup>1</sup> to <sup>X</sup> such that p i<sup>0</sup> reduces to <sup>x</sup> and p i<sup>1</sup> reduces to <sup>y</sup>. This allows, for example, a simple proof of function extensionality: if we have A : Type, B : A <sup>→</sup> Type, f and g functions of type (a : A) <sup>→</sup> B a, and h : (a : A) <sup>→</sup> f a <sup>≡</sup> g a, then we have (λi.λa.h a i) : f <sup>≡</sup> g. Taking cong f <sup>=</sup> λp.λi.f (p i) : x <sup>≡</sup> y <sup>→</sup> f x <sup>≡</sup> f y and ◦ for function composition, we also have nice properties such as (cong f) ◦ (cong g) = cong (f ◦ g).

In this section, we construct the running example from Fig. 1, along with the simple elimination rules, in cubical type theory. Our construction proceeds in several steps:


$$\mathbb{O}, S \mathbb{O}, S \text{ (}S \text{ @)}, \dots, S^n \text{ @)}, \dots$$

and show that it is nice. This is the only section that utilizes the differences between cubical type theory and standard intentional type theory.

Given the nice goodness algebra in Sect. 3.5 we can then construct the simple elimination rules by Sect. 3.3. This construction has been formalized in Agda<sup>2</sup> using the --cubical flag which implies --without-K (see Running Example.agda).

The intuition for our construction is that the Nordvall Forsberg's approach of pairing an approximation with goodness predicates can be repeated, and each time the approximation gets better. Using HoTT terminology, we showed in

<sup>2</sup> Agda version 2.6.0 commit bd338484d.

Sect. <sup>2</sup> that one iteration suffices only if X has homotopy level 0 (is a homotopy set, satisfies UIP). In general, n + 1 iterations are sufficient if only if X has homotopy level n. The successor goodness algebra defined in Sect. 3.4 is a slightly simplified version of Nordvall Forsberg's construction, and taking the limit (in Sect. 3.5) gives a construction which works for arbitrary homotopy levels.

#### **3.1 Pre-syntax**

The pre-syntax is the same as that used in Sect. 2, defined as a mutually inductive type in Fig. 2. The constructors of the pre-syntax have the same types as the constructors of the full inductive-inductive definition (given in Fig. 1), except we replace B a with <sup>B</sup>pre everywhere, ignoring the dependence of <sup>B</sup> on <sup>A</sup>.

Consider this as the closest approximation of the target inductive-inductive type by a standard inductive type; the dependence of B on A is the only new element that inductive-inductive definitions add. Of course, this is only an approximation. We can form elements of the pre-syntax, such as

$$\mathtt{ext}\_{\mathtt{pre}}\left(\eta\_{\mathtt{pre}}\,\boldsymbol{x}\right)\left(\mathtt{inj}\_{\mathtt{pre}}\left(\eta\_{\mathtt{pre}}\,\boldsymbol{y}\right)\right)$$

for x <sup>=</sup> y that should be excluded from the inductive-inductive formulation, since inj (η y) : B (η y) while ext (η x) : B (η x) <sup>→</sup> A.

We will use definitions by induction and by pattern-matching on the presyntax in sections Sects. 3.3 and 3.4 respectively.

#### **3.2 Goodness Algebras**

As we saw in Sect. 3.1, the pre-syntax is too lenient, and contains terms we want to exclude from the inductive-inductive object. In this section, we define a notion of sub-algebra of the pre-syntax, which we will call a *goodness algebra*, and explain how to combine a goodness algebra with the pre-syntax to get an inductive-inductive object (A, B, η, ext, inj). We also define a goodness algebra O.

In Fig. 4, for each clause of the inductive-inductive specification, we define 3 things:


#### **Fig. 4.** Goodness algebras

3. A way to combine the goodness algebra with the pre-syntax to form an inductive-inductive object. For sorts, we pair the pre-syntax with a goodness proof, while for operations we apply the operation given by the goodness algebra, mimicking the construction in Fig. 3.

Comparing this definition to the construction in Sect. 2, the mutual inductive definition of <sup>A</sup>good and <sup>B</sup>good (in Fig. 3) has types equivalent to the result of dropping the dependence of δ<sup>G</sup>.B on <sup>δ</sup><sup>G</sup>.A (defined in Fig. 4), going from

<sup>δ</sup>G.B : (<sup>a</sup> : <sup>A</sup>pre) <sup>×</sup> <sup>δ</sup>G.A a <sup>→</sup> <sup>B</sup>pre <sup>→</sup> Type to <sup>B</sup>good : <sup>A</sup>pre <sup>→</sup> <sup>B</sup>pre <sup>→</sup> Type.

The other difference is that we replace the inductive index (call it s) in the conclusion by a fresh variable φ, with the condition s <sup>=</sup> φ included in Arg.

#### **3.3 Niceness**

In this section, we identify a property *niceness* that is sufficient for a goodness algebra to produce an inductive-inductive object (A, B, η, ext, inj) which satisfies the simple elimination rules, as given in Fig. 1.

To define niceness, we use the concept of equivalence, as defined in Univalent Foundations Program [18] (§4.4 Contractible fibers). Given a function f : A <sup>→</sup> B, we write isEquiv f (leaving A and B implicit) to denote that f is an equivalence between A and B. We will also write A B for the type of pairs of a function f with a proof that f is an equivalence.

We will say that a goodness algebra is *nice* if we have equivalence proofs (δ<sup>N</sup> .η, δ<sup>N</sup> .ext, δ<sup>N</sup> .inj), with types

$$\begin{aligned} \delta^N. \eta \ x \ \phi: \text{isEquiv } (\delta^G. \eta \ x \ \phi), \\ \delta^N. \textbf{ext } (a, b) \ \phi: \text{isEquiv } (\delta^G. \textbf{ext } (a, b) \ \phi), \\ \delta^N. \textbf{inj } a \ \phi: \text{isEquiv } (\delta^G. \textbf{inj } a \ \phi). \end{aligned}$$

Equivalences between types are very close to equalities between types (the Univalence axiom makes this precise). If we have a *nice* goodness algebra, the combined data looks similar to a recursive definition:

$$\begin{aligned} \delta^G.A: \top &\to A\_{\text{pre}} \to \text{Type},\\ \delta^G.B: ((a:A\_{\text{pre}}) \times \delta^G.A \star a) &\to B\_{\text{pre}} \to \text{Type},\\ \delta^G.A\ \phi\left(\eta\_{\text{pre}}\,x\right) &\simeq \text{Arg}\,\eta\,x\ \phi,\\ \delta^G.A\ \phi\left(\text{ext}\_{\text{pre}}\,a\,b\right) &\simeq \text{Arg}\,\text{ext}\,\left(a,b\right)\,\phi,\\ \delta^G.B\ \phi\left(\text{inj}\_{\text{pre}}\,a\right) &\simeq \text{Arg}\,\text{inj}\,\,a\ \phi. \end{aligned}$$

However, the dependence of δ<sup>G</sup>.B on <sup>δ</sup><sup>G</sup>.A makes this what Nordvall Forsberg calls a "recursive-recursive" definition, and so we cannot use the standard eliminator of the pre-syntax. In Sect. 3.5, we will expend much effort to construct a solution to this system. Once we have done so, the inductive-inductive object produced by the goodness algebra will satisfy the simple elimination rules, as we show in the following lemma.

**Lemma 6 (Nice goodness algebras give simple elimination rules).** *Given a goodness algebra* δ<sup>G</sup> *with proof of niceness* <sup>δ</sup><sup>N</sup> *, the inductive-inductive object* (A, B, η, *ext*, *inj*) *produced from* δ<sup>G</sup> *as specified in Sect. 3.2 satisfies the simple induction rules given in Fig. 1.*

*Proof.* The proof is formalized in RunningExample.agda. The main idea of the proof is to induct on the pre-syntax, and exploit the equivalences provided by niceness <sup>δ</sup><sup>N</sup> . In the inj case for example, we have a proof of <sup>δ</sup>G.B φ (injpre <sup>a</sup>). But without loss of generality, we can replace that goodness proof with <sup>δ</sup>G.inj applied to an element of Arg inj a φ, which contains both a proof <sup>a</sup>good : <sup>δ</sup>G.A a and a proof that (a, agood) <sup>≡</sup> <sup>φ</sup>. Using <sup>J</sup> to eliminate that equality leaves a goal to which the provided simple induction step for inj applies. This proof does not use cubical type theory in any essential way.

#### **3.4 Successor Goodness Algebra**

We are trying to create a nice goodness algebra by taking the limit of successive approximations, so we need a step function, which we will call S, that takes a goodness algebra <sup>δ</sup><sup>G</sup> and returns a new goodness algebra S δ<sup>G</sup>, which is closer in some sense to being nice. We do so by pattern matching on the pre-syntax to unroll one level of the recurrence equations niceness encodes.

We define by pattern matching

$$\begin{aligned} &(E\ \delta^G)A: (a:A\_{\text{pre}}) \to (\phi: \text{Ix}\ A\ \delta^G) \to (Y:\text{Type}) \times (Y \to \delta^G.A\ \phi\ \,a),\\ &(E\ \delta^G)B: (b:B\_{\text{pre}}) \to (\phi:\text{Ix}\ B\ \delta^G) \to (Y:\text{Type}) \times (Y \to \delta^G.B\ \phi\ \,b),\\ &(E\ \delta^G)A\ (\eta\_{\text{pre}}\ x) = \lambda\phi.\text{ Arg}\ \eta\ \delta^G x\ \phi, \delta^G.\eta\ x\ \phi,\\ &(E\ \delta^G)A\ (\textbf{ext}\_{\text{pre}}\ a\ \,b) = \lambda\phi.\text{ Arg}\ \textbf{ext}\ \delta^G\ (a,b)\ \phi, \delta^G.\textbf{ext}\ (a,b)\ \phi,\\ &(E\ \delta^G)B\ (\textbf{inj}\_{\text{pre}}\ a) = \lambda\phi.\text{Arg}\ \textbf{inj}\ \delta^G\ a\ \phi, \delta^G.\textbf{inj}\ a\ \phi,\end{aligned}$$

which gives a new property Y which maps back to δ<sup>G</sup>.B φ b for each <sup>b</sup> and <sup>φ</sup>, and similarly for A.

Then, in Fig. 5, we define the new goodness algebra (S δ<sup>G</sup>) along with projection functions (δ<sup>π</sup> δ<sup>G</sup>) which take Ix and Arg from (S δ<sup>G</sup>) to δ<sup>G</sup>.

The projection functions (δ<sup>π</sup> δ<sup>G</sup>) consist of applying the map given by the second component of (E δ<sup>G</sup>) everywhere in sight. The sorts are then defined by the first component of (E δ<sup>G</sup>), while the operations can be defined to be the corresponding projection function itself.

Concretely, for the sort B, we define (δ<sup>π</sup> <sup>δ</sup><sup>G</sup>).B to map between Ix <sup>B</sup> (Sδ<sup>G</sup>) and Ix B δ<sup>G</sup>. This consists of applying the function ((E δ<sup>G</sup>).A apre .2) which we defined by pattern matching above to <sup>a</sup>good. Then, since (S δ<sup>G</sup>).B gets an inductive index φ in (S δ<sup>G</sup>) but ((E δ<sup>G</sup>) bφ.1) is expecting an inductive index in δ<sup>G</sup>, we span the gap with the projection function (δ<sup>π</sup> <sup>δ</sup><sup>G</sup>).B just defined. The definition of A follows the same pattern, but (δ<sup>π</sup> <sup>δ</sup><sup>G</sup>).A is even simpler because Ix A δ<sup>G</sup> <sup>=</sup> regardless of what goodness algebra we are working in.

For the operations, consider inj. Like with the sorts, we first define a projection function (δ<sup>π</sup> δ<sup>G</sup>).inj a φ, which maps from Arg inj (S δ<sup>G</sup>) to Arg inj δ<sup>G</sup>, and we fix up the inductive index φ with (δ<sup>π</sup> <sup>δ</sup><sup>G</sup>).B. For the first component of Arg, we use the function given by the second component of (E δ<sup>G</sup>).A to fix up agood. For the second component, applying the projection (δ<sup>π</sup> δ<sup>G</sup>).B to the equality proof works out on the left hand side because all these projection functions

**Fig. 5.** Successor goodness algebra

are doing the same thing: applying the function given by the second component of (E δ<sup>G</sup>) everywhere. Finally, we can define (S δ<sup>G</sup>).inj = (δ<sup>π</sup> δ<sup>G</sup>).inj, because (S δ<sup>G</sup>).inj a φ is supposed to have codomain

$$(S\ \delta^G).B\ \phi\left(\mathbf{inj}\_{\text{pre}}\ a\right),$$

which is defined to be

$$((E\ \delta^G).B\ (\mathbf{in}\mathbf{j}\_{\mathrm{pre}}\ a)\ ((\delta^\pi\ \delta^G).B\ \phi)\ .1,$$

which reduces on (injpre <sup>a</sup>) to

$$\text{Arg}\,\mathbf{in}\mathbf{j}\,\delta^G\,\,a\,\left( (\delta^\pi\,\,\delta^G).B\,\,\phi \right),$$

which is exactly the codomain of (δ<sup>π</sup> δ<sup>G</sup>).inj a φ.

#### **3.5 Limit of Goodness Algebras**

We will now construct a nice goodness algebra by taking the limit of the sequence S<sup>n</sup> <sup>O</sup> and showing that it is nice, where S<sup>n</sup> <sup>O</sup> is defined by recursion on n with S<sup>0</sup><sup>O</sup> <sup>=</sup> <sup>O</sup>, S1+n<sup>O</sup> <sup>=</sup> S(S<sup>n</sup> <sup>O</sup>). But first, we consider the limit of a chain of types. **Limit of Types.** This subsection *Limit of Types* is formalized in Chain.agda. In order to take the limit of successive goodness algebras, we need to know how to work with *chains* of types. Specifically, given (X : <sup>N</sup> <sup>→</sup> <sup>I</sup> <sup>→</sup> Type) and <sup>π</sup> : (<sup>n</sup> : <sup>N</sup>) <sup>→</sup> <sup>X</sup> (<sup>n</sup> + 1) <sup>i</sup><sup>0</sup> <sup>→</sup> Xni<sup>1</sup>, we consider the limit given by the type

$$\text{chain.t } X \,\,\pi = (f : (n: \mathbb{N}) \to X \,\, n \,\, i\_0) \times ((n: \mathbb{N}) \to f \,\, n \equiv\_{X \,\, n} \pi \,\, n \,\, (f \,\, (n+1)).$$

If we have x : chain.t X π, then let x.p denote the second projection.

This definition is designed to work well in cubical type theory, and uses the interval <sup>I</sup> and native heterogeneous equality <sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>y</sup> where <sup>X</sup> : <sup>I</sup> <sup>→</sup> Type (where we can form <sup>p</sup> <sup>=</sup> λi.w : <sup>x</sup> <sup>≡</sup><sup>X</sup> <sup>y</sup> when p i<sup>0</sup> <sup>=</sup> <sup>x</sup>, p i<sup>1</sup> <sup>=</sup> <sup>y</sup>, and p i : X i). In particular, this definition allows for dependent chains without transporting over the base equality, which is problematic in cubical type theory because transport gets stuck on neutral types; instead given

$$\begin{aligned} A: \mathbb{N} \to \text{Type} \qquad \text{with} \qquad f\_A: (n: \mathbb{N}) \to A \ (1+n) \to A \ n \qquad \text{and} \\ B: (n: \mathbb{N}) \to A \ n \to \text{Type} \qquad \text{with} \\ f\_B: (n: \mathbb{N}) \to (a: A \ (1+n)) \to B \ (1+n) \ a \to B \ n \ (f\_A \ n \ a), \end{aligned}$$

we can form

$$\begin{aligned} LA &= \text{chain.t} \left( \lambda n. \lambda i. A \ n \right) f\_A & \text{i Type,} \\ LB &= \lambda a. \text{chain.t} (\lambda n. \text{cong}(B \ n) (a.p \ n)) (\lambda n. f\_B \ n \ (a.p \ (1 + n) \ i\_0)) & \text{:} LA &\text{to Type.} \end{aligned}$$

using cong(B n)(a.p n) which is particularly well behaved in cubical type theory.

This construction commutes with most type formers: dependent function types, dependent pair types, identity types, and constants. We also note a dependent version of the fact that the limit of a chain is equivalent to the limit of a shifted chain to substitute for Ahrens et al. [1, Lemma 12].

#### **Lemma 7 (Dependent chain equivalent to shifted chain).** *Given*

$$\begin{aligned} &X: \mathbb{N} \to \text{Type}, \qquad \pi\_X: (n: \mathbb{N}) \to X \ (1+n) \to X \ n, \\ &Y\_0: (n: \mathbb{N}) \to X \ n \to \text{Type}, \qquad Y\_1: (n: \mathbb{N}) \to X \ n \to \text{Type}, \\ &f: (n: \mathbb{N}) \to (x: X \ n) \to Y\_1 \ n \ x \to Y\_0 \ n \ x, \\ &g: (n: \mathbb{N}) \to (x: X \ (1+n)) \to Y\_0 \ (1+n) \ x \to Y\_1 \ n \ (\pi\_X \ n \ x), \\ &x: \operatorname{chain}.t \ (\lambda n. \lambda i. X \ n) \ \pi\_X, \end{aligned}$$

*and letting the* X *arguments to* f *and* g *be implicit, we can define the types*

$$\begin{aligned} t &= \operatorname{chain}.t \left(\lambda n. \operatorname{cong} \left(Y\_0 \ n\right) \left(x.p \ n\right)\right) \left(\lambda n. \lambda y. f \ n \left(g \ n \ y\right)\right), \\ t^+ &= \operatorname{chain}.t \left(\lambda n. \operatorname{cong} \left(Y\_1 \ n\right) \left(x.p \ n\right)\right) \left(\lambda n. \lambda y. g \ n \left(f \ \left(1+n\right) \ y\right)\right). \end{aligned}$$

*Applying* f *component-wise gives a function from* t <sup>+</sup> *to* t*. This function is an equivalence.*

We only use Lemma <sup>7</sup> when <sup>Y</sup><sup>1</sup> <sup>n</sup> (π<sup>X</sup> n x) = <sup>Y</sup><sup>0</sup> (1 + <sup>n</sup>) <sup>x</sup>, so we may take <sup>g</sup> to be the identity, leaving t <sup>+</sup> the shifted chain of t up to X arguments.

**Limit of Goodness Algebras.** Now we use the lemmas about chains to construct a nice goodness algebra, and then conclude by constructing an inductiveinductive object (A, B, η, ext, inj) that satisfies the simple elimination rules.

**Lemma 8.** *A nice goodness algebra exists.*

*Proof.* The sorts of the limit goodness algebra are defined as a chain, and operations act pointwise on each component of the chain. To prove that the operations are equivalences, we compose a proof that Arg commutes with chains (given by combining the lemmas about chains commuting with type formers) with a proof that for each sort, the chain given by the (E (S<sup>n</sup> <sup>O</sup>)) is equivalent to the chain given by (S<sup>n</sup> <sup>O</sup>) (given by Lemma 7). Since (E (S<sup>n</sup> <sup>O</sup>)) is defined by pattern matching to reduce to Arg, the right and left sides of these equivalences agree, and we find that the operations are indeed nice. See the formalization for details.

**Theorem 2.** *There exists an inductive-inductive object* (A, B, η, *ext*, *inj*) *that satisfies the simple elimination rules as defined in Fig. 1.*

*Proof.* A nice goodness algebra exists by Lemma 8, therefore we can construct (A, B, η, ext, inj) satisfying the simple elimination rules by Lemma 6.

We have therefore succeeded. In cubical type theory, the inductive-inductive definition from Fig. 1 is constructible.

#### **4 Related Work**

The principle of simultaneously defining a type and a family over that type has been used many times before. Danielsson [9] used an inductive-inductiverecursive definition to define the syntax of dependent type theory, and Chapman [5] used an inductive-inductive definition for the same purpose. Conway's surreal numbers [7] are given (up to a defined equivalence relation) by the inductiveinductive definition of number and less than, where less than is a relation indexed by two numbers [15, §7.1]. The HoTT book §11.3 gives a definition of the Cauchy reals as a higher inductive-inductive definition [18].

In his thesis and previous papers [15–17], Nordvall Forsberg studies the general theory of inductive-inductive types, axiomatizing a limited class of such definitions, and giving a set theoretic model showing that they are consistent. He also considers various extensions such as allowing a third type indexed by the first two, allowing the second type to be indexed by two elements of the first, or combining inductive-inductive definitions with inductive-recursive definitions from Dybjer and Setzer [10].

There have been several attempts to define a general class of inductiveinductive types larger than that in Nordvall Forsberg's thesis. Kaposi and Kov´acs [14] gives an external syntactic description of a class which includes higher inductive-inductive types, and Altenkirch et al. [2] gives a semantic description of a class including quotient inductive-inductive types, but neither gives a type of codes that can be reasoned about internally. Working with UIP, Altenkirch et al. [4] propose a class of quotient inductive-inductive types.

Nordvall Forsberg's thesis [15] appears to give the best previously known reduction of inductive-inductive types to regular inductive types known. As we have shown, Nordvall Forsberg's approach can only be applied to intensional type theory if UIP holds. Furthermore, the equations for both Nordvall Forsberg's approach and our approach only hold propositionally.

Many other structures have been reduced to plain inductive types. Our construction of inductive-inductive types can be seen as an adaptation of the technique in Ahrens et al. [1], where coinductive types are constructed from N by taking a limit. Indexed inductive types (which are used in Nordvall Forsberg's construction) are constructed from plain inductive types in Altenkirch et al. [3], with good computational properties (provided an identity type that satisfies J strictly). And small induction-recursion is reduced to plain indexed inductive types in Hancock et al. [11].

#### **5 Conclusions and Future Work**

In this paper, we have:


We claim that the construction of our specific running example is straightforwardly generalizable to other inductive-inductive types, and have formalized the construction of a number of other examples including types with non-finitary constructors and indices to support this claim (see the GitHub repository referenced in the introduction).

Going forward, we would like to investigate


– In the opposite direction from the previous point, rewriting the construction given here in Coq + Function Extensionality. While the elimination rules will have poor computational behavior, this would make using inductiveinductive types in Coq possible without requiring any change to Coq itself, while being compatible with HoTT. In particular, using cubical type theory makes the proofs in Sect. 3.5 simpler, but we speculate that axiomatic function extensionality is sufficient.

**Acknowledgements.** I would like to thank Talia Ringer and Dan Grossman from the UW PLSE lab, for their invaluable feedback throughout the revision process. I also thank Pavel Panchekha, John Leo, Remy Wang, and Fredrik Nordvall Forsberg for their comments.

Some of this work was completed while studying at Tokyo Institute of Technology under Professor Ryo Kashima. I would like to thank Professor Kashima, as well as my fellow lab members and mentors Asami and Maniwa for making my stay both productive and enjoyable.

# **References**


#### 312 J. Hugunin

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Causal Inference by String Diagram Surgery**

Bart Jacobs<sup>1</sup>, Aleks Kissinger<sup>1</sup>, and Fabio Zanasi2(B)

<sup>1</sup> Radboud University, Nijmegen, The Netherlands <sup>2</sup> University College London, London, UK f.zanasi@ucl.ac.uk

**Abstract.** Extracting causal relationships from observed correlations is a growing area in probabilistic reasoning, originating with the seminal work of Pearl and others from the early 1990s. This paper develops a new, categorically oriented view based on a clear distinction between syntax (string diagrams) and semantics (stochastic matrices), connected via interpretations as structure-preserving functors.

A key notion in the identification of causal effects is that of an intervention, whereby a variable is forcefully set to a particular value independent of any prior dependencies. We represent the effect of such an intervention as an endofunctor which performs 'string diagram surgery' within the syntactic category of string diagrams. This diagram surgery in turn yields a new, interventional distribution via the interpretation functor. While in general there is no way to compute interventional distributions purely from observed data, we show that this is possible in certain special cases using a calculational tool called comb disintegration.

We showcase this technique on a well-known example, predicting the causal effect of smoking on cancer in the presence of a confounding common cause. We then conclude by showing that this technique provides simple sufficient conditions for computing interventions which apply to a wide variety of situations considered in the causal inference literature.

**Keywords:** Causality · String diagrams · Probabilistic reasoning

#### **1 Introduction**

An important conceptual tool for distinguishing correlation from causation is the possibility of *intervention*. For example, a randomised drug trial attempts to destroy any confounding 'common cause' explanation for correlations between drug use and recovery by randomly assigning a patient to the control or treatment group, independent of any background factors. In an ideal setting, the observed correlations of such a trial will reflect genuine causal influence. Unfortunately, it is not always possible (or ethical) to ascertain causal effects by means of actual interventions. For instance, one is unlikely to get approval to run a clinical trial on whether smoking causes cancer by randomly assigning 50% of the patients to smoke, and waiting a bit to see who gets cancer. However, in certain situations it is possible to predict the effect of such a hypothetical intervention from purely observational data.

In this paper, we will focus on the problem of *causal identifiability*. For this problem, we are given observational data as a joint distribution on a set of variables and we are furthermore provided with a *causal structure* associated with those variables. This structure, which typically takes the form of a directed acyclic graph or some variation thereof, tells us which variables can in principle have a causal influence on others. The problem then becomes whether we can measure how strong those causal influences are, by means of computing an *interventional* distribution. That is, can we ascertain what would have happened if a (hypothetical) intervention had occurred?

Over the past 3 decades, a great deal of work has been done in identifying necessary and sufficient conditions for causal identifiability in various special cases, starting with very specific notions such as the *back-door* and *front-door* criteria [20] and progressing to more general necessary and sufficient conditions for causal identifiability based on the **do**-calculus [11], or combinatoric concepts such as confounded components in semi-Makovian models [25,26].

This style of causal reasoning relies crucially on a delicate interplay between syntax and semantics, which is often not made explicit in the literature. The syntactic object of interest is the causal structure (e.g. a causal graph), which captures something about our understanding of the world, and the mechanisms which gave rise to some observed phenomena. The semantic object of interest is the data: joint and conditional probability distributions on some variables. Fixing a causal structure entails certain constraints on which probability distributions can arise, hence it is natural to see distributions satisfying those constraints as models of the syntax.

In this paper, we make this interplay precise using functorial semantics in the spirit of Lawvere [17], and develop basic syntactic and semantic tools for causal reasoning in this setting. We take as our starting point a functorial presentation of Bayesian networks similar to the one appearing in [7]. The syntactic role is played by string diagrams, which give an intuitive way to represent morphisms of a monoidal category as boxes plugged together by wires. Given a directed acyclic graph (dag) <sup>G</sup>, we can form a free category Syn*<sup>G</sup>* whose arrows are (formal) string diagrams which represent the causal structure syntactically. Structurepreserving functors from Syn*<sup>G</sup>* to Stoch, the category of stochastic matrices, then correspond exactly to Bayesian networks based on the dag G.

Within this framework, we develop the notion of intervention as an operation of 'string diagram surgery'. Intuitively, this cuts a string diagram at a certain variable, severing its link to the past. Formally, this is represented as an endofunctor on the syntactic category cut*<sup>X</sup>* : Syn*<sup>G</sup>* → Syn*G*, which propagates through a model <sup>F</sup> : Syn*<sup>G</sup>* <sup>→</sup> Stoch to send observational probabilities <sup>F</sup>(ω) to interventional probabilities <sup>F</sup>(cut*X*(ω)).

The cut*<sup>X</sup>* endofunctor gives us a diagrammatic means of computing interventional distributions given complete knowledge of F. However, more interestingly, we can sometimes compute interventionals given only partial knowledge of F, namely some observational data. We show that this can also be done via a technique we call *comb disintegration*, which is a string diagrammatic version of a technique called *c-factorisation* introduced by Tian and Pearl [26]. Our approach generalises disintegration, a calculational tool whereby a joint state on two variables is factored into a single-variable state and a channel, representing the marginal and conditional parts of the distribution, respectively. Disintegration has recently been formulated categorically in [5] and using string diagrams in [4]. We take the latter as a starting point, but instead consider a factorisation of a three-variable state into a channel and a *comb*. The latter is a special kind of map which allows inputs and outputs to be interleaved. They were originally studied in the context of quantum communication protocols, seen as games [8], but have recently been used extensively in the study of causally-ordered quantum [3,21] and generalised [15] processes. While originally imagined for quantum processes, the categorical formulation given in [15] makes sense in both the classical case (Stoch) and the quantum. Much like Tian and Pearl's technique, comb factorisation allows one to characterise when the confounding parts of a causal structure are suitably isolated from each other, then exploit that isolation to perform the concrete calculation of interventional distributions.

However, unlike in the traditional formulation, the syntactic and semantic aspects of causal identifiability within our framework exactly mirror one-another. Namely, we can give conditions for causal identifiability in terms of factorisation a morphism in Syn*G*, whereas the actual concrete computation of the interventional distribution involves factorisation of its interpretation in Stoch. Thanks to the functorial semantics, the former immediately implies the latter.

To introduce the framework, we make use of a running example taken from Pearl's book [20]: identifying the causal effect of smoking on cancer with the help of an auxiliary variable (the presence of tar in the lungs). After providing some preliminaries on stochastic matrices and the functorial presentation of Bayesian networks in Sects. 2 and 3, we introduce the smoking example in Sect. 4. In Sect. 5 we formalise the notion of intervention as string diagram surgery, and in Sect. 6 we introduce the combs and prove our main calculational result: the existence and uniqueness of comb factorisations. In Sect. 7, we show how to apply this theorem in computing the interventional distribution in the smoking example, and in 8, we show how this theorem can be applied in a more general case which captures (and slightly generalises) the conditions given in [26]. In Sect. 9, we conclude and describe several avenues of future work.

#### **2 Stochastic Matrices and Conditional Probabilities**

Symmetric monoidal categories (SMCs) give a very general setting for studying processes which can be composed in sequence (via the usual categorical composition ◦) and in parallel (via the monoidal composition ⊗). Throughout this paper, we will use *string diagram* notation [24] for depicting composition of morphisms in an SMC. In this notation, morphisms are depicted as boxes with labelled input and output wires, composition ◦ as 'plugging' boxes together, and the monoidal product ⊗ as placing boxes side-by-side. Identity morphisms are depicted simply as a wire and the unit I of <sup>⊗</sup> as the empty diagram. The 'symmetric' part of the structure consists of symmetry morphisms, which enable us to permute inputs and outputs arbitrarily. We depict these as wire-crossings: . Morphisms whose domain is I are called *states*, and they will play a special role throughout this paper.

A monoidal category of prime interest in this paper is Stoch, whose objects are finite sets and morphisms *<sup>f</sup>* : A <sup>→</sup> B are <sup>|</sup>B|×|A<sup>|</sup> dimensional stochastic matrices. That is, they are matrices of positive numbers (including 0) whose columns each sum to 1:

$$\mathbf{f} = \{ \mathbf{f}\_i^j \in \mathbb{R}^+ \mid i \in A, j \in B \} \qquad \text{with} \qquad \sum\_j \mathbf{f}\_i^j = 1, \text{ for all } i.$$

Note we adopt the physicists convention of writing row indices as superscripts and column indices as subscripts. Stochastic matrices are of interest for probabilistic reasoning, because they exactly capture the data of a conditional probability distribution. That is, if we take A := {1,...,m} and B := {1,...,n}, conditional probabilities naturally arrange themselves into a stochastic matrix:

*fj <sup>i</sup>* := <sup>P</sup>(<sup>B</sup> <sup>=</sup> <sup>j</sup>|<sup>A</sup> <sup>=</sup> <sup>i</sup>) *f* = ⎛ ⎜⎝ P(B = 1|A = 1) ··· P(B = 1|A <sup>=</sup> m) . . . ... . . . P(B <sup>=</sup> n|A = 1) ··· P(B <sup>=</sup> n|A <sup>=</sup> m) ⎞ ⎟⎠

States, i.e. stochastic matrices from a trivial input I := {∗}, are (nonconditional) probability distributions, represented as column vectors. There is only one stochastic matrix with trivial output: the row vector consisting only of 1's. The latter, with notation as on the right, will play a special role in this paper (see (1) below).

Composition of stochastic matrices is matrix multiplication. In terms of conditional probabilities, that is multiplication followed by marginalization over the shared variable: - <sup>B</sup> <sup>P</sup>(C|B)P(B|A). Identities are thus given by identity matrices, which we will often express in terms of the Kronecker delta function *δ*<sup>j</sup> i .

The monoidal product ⊗ in Stoch is the cartesian product on objects, and Kronecker product of matrices: (*f* ⊗ *g*) (k,l) (i,j) := *f* <sup>k</sup> <sup>i</sup> *g*<sup>l</sup> <sup>j</sup> . We will typically omit parentheses and commas in the indices, writing e.g. *h*kl ij instead of *<sup>h</sup>*(k,l) (i,j) for an arbitrary matrix entry of *<sup>h</sup>*: A <sup>⊗</sup> B <sup>→</sup> C <sup>⊗</sup> D. In terms of conditional probabilities, the Kronecker product corresponds to taking product distributions. That is, if *<sup>f</sup>* represents the conditional probabilities P(B|A) and *<sup>g</sup>* the probabilities P(D|C), then *<sup>f</sup>* <sup>⊗</sup>*<sup>g</sup>* represents P(B|A)P(D|C). Stoch also comes with a natural choice of 'swap' matrices *<sup>σ</sup>* : A⊗B <sup>→</sup> B⊗A given by *<sup>σ</sup>*kl ij := *δ*<sup>l</sup> i*δ*<sup>k</sup> <sup>j</sup> , making it into a symmetric monoidal category. Every object A in Stoch has three other pieces of structure which will play a key role in our formulation of Bayesian networks and interventions: the *copy* map, the *discarding* map, and the *uniform state*:

$$\left(\begin{array}{c}\bigvee\\\blackdot\end{array}\right)^{jk}\_{i} \coloneqq \delta^{j}\_{i}\delta^{k}\_{i} \qquad\qquad\left(\begin{array}{c}\big\uparrow\\\blackdot\downarrow\end{array}\right)\_{i} \coloneqq 1 \qquad\qquad\left(\begin{array}{c}\bigvee\\\blackdot\downarrow\end{array}\right)^{i} \coloneqq \frac{1}{|A|} \qquad\qquad\text{(1)}$$

Abstractly, this provides Stoch with the structure of a *CDU category*.

**Definition 2.1.** *A* CDU category *(for copy, discard, uniform) is a symmetric monoidal category* (C, <sup>⊗</sup>, I) *where each object* A *has a copy map* : <sup>A</sup> <sup>→</sup> <sup>A</sup>⊗A*, a discarding map* : <sup>A</sup> <sup>→</sup> <sup>I</sup>*, and a uniform state* : <sup>I</sup> <sup>→</sup> <sup>A</sup> *satisfying the following equations:*

CDU functors *are symmetric monoidal functors between CDU categories preserving copy maps, discard maps and uniform states.*

We assume that the CDU structure on I is trivial and the CDU structure on A <sup>⊗</sup> B is constructed in the obvious way from the structure on A and B. We also use the first equation in (2) to justify writing 'copy' maps with arbitrarily many output wires: ... .

Similar to [2], we can form the free CDU category FreeCDU(X, Σ) over a pair (X, Σ) of a generating set of objects X and a generating set Σ of typed morphisms f : u <sup>→</sup> w, with u, w <sup>∈</sup> X as follows. The category FreeCDU(X, Σ) has X as set of objects, and morphisms the string diagrams constructed from the elements of Σ and maps : <sup>x</sup> <sup>→</sup> <sup>x</sup> <sup>⊗</sup> <sup>x</sup>, : <sup>x</sup> <sup>→</sup> <sup>I</sup> and : <sup>I</sup> <sup>→</sup> <sup>x</sup> for each x <sup>∈</sup> X, taken modulo the equations (2).

**Lemma 2.2.** Stoch *is a CDU category, with CDU structure defined as in* (1)*.*

An important feature of Stoch is that I <sup>=</sup> {} is the final object, with : <sup>B</sup> <sup>→</sup> <sup>I</sup> the map provided by the universal property, for any set B. This yields Eq. (3) on the right, for any *<sup>f</sup>* : A <sup>→</sup>

A B, justifying the name "discarding map" for . We conclude by recording another significant feature of Stoch: *disintegration* [4,5]. In probability theory, this is the mechanism of factoring a joint probability distribution P(AB) as a product of the first marginal P(A) and a conditional distribution P(B|A). We recall from [4] the string diagrammatic rendition of this process. We say that a morphism *<sup>f</sup>* : X <sup>→</sup> Y in Stoch has *full support* if, as a stochastic matrix, it has no zero entries. When *f* is a state, it is a standard result that full support ensures uniqueness of disintegrations of *f*.

**Proposition 2.3 (Disintegration).** *For any state <sup>ω</sup>* : I <sup>→</sup> A <sup>⊗</sup> B *with full support, there exists unique morphisms <sup>a</sup>*: I <sup>→</sup> A, *<sup>b</sup>*: A <sup>→</sup> B *such that:*

$$
\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\end{array}}
}
\end{array}
}
\end{array}
}
\end{array}}
}
\end{array}}
\begin{array}{c}
\end{array}}
\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\end{array}
}
\end{array}
}
\end{array}}
\end{array}}
\end{array}}
\begin{array}{c}
\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\boxed{\begin{array}{c}
\end{array}
}
\end{array}
}
\end{array}}
\end{array}}
\tag{4}
$$

*f* =

B (3)

B

Note that Eq. (3) and the CDU rules immediately imply that the unique *<sup>a</sup>*: I <sup>→</sup> A in Proposition 2.3 is the marginal of *<sup>ω</sup>* onto A: <sup>A</sup> <sup>B</sup> *<sup>ω</sup>* .

#### **3 Bayesian Networks as String Diagrams**

Bayesian networks are a widely-used tool in probabilistic reasoning. They give a succinct representation of conditional (in)dependences between variables as a directed acyclic graph. Traditionally, a Bayesian network on a set of variables A, B, C, . . . is defined as a directed acyclic graph (dag) G, an assignment of sets to each of the nodes <sup>V</sup><sup>G</sup> := {A, B, C, . . .} of <sup>G</sup> and a joint probability distribution over those variables which factorises as P(V<sup>G</sup>) = <sup>A</sup>∈V*<sup>G</sup>* <sup>P</sup>(<sup>A</sup> <sup>|</sup>Pa(A)) where Pa(A) is the set of parents of A in G. Any joint distribution that factorises this way is said to satisfy the *global Markov property* with respect to the dag G. Alternatively, a Bayesian network can be seen as a dag equipped with a set of conditional probabilities {P(A <sup>|</sup>Pa(A)) <sup>|</sup> A <sup>∈</sup> V<sup>G</sup>} which can be combined to form the joint state. Thanks to disintegration, these two perspectives are equivalent.

Much like in the case of disintegration in the previous section, Bayesian networks have a neat categorical description as string diagrams in the category Stoch [7,13,14]. For example, here is a Bayesian network in its traditional depiction as a dag with an associated joint distribution over its vertices, and as a string diagram in Stoch:

In the string diagram above, the stochastic matrix *<sup>a</sup>*: I <sup>→</sup> A contains the probabilities P(A), *<sup>b</sup>*: B <sup>→</sup> A contains the conditional probabilities P(B|A), *<sup>c</sup>*: B <sup>⊗</sup> D <sup>→</sup> C contains P(C|BD), and so on. The entire diagram is then equal to a state *<sup>ω</sup>* : I <sup>→</sup> A <sup>⊗</sup> B <sup>⊗</sup> C <sup>⊗</sup> D <sup>⊗</sup> E which represents P(ABCDE).

Note the dag and the diagram above look similar in structure. The main difference is the use of copy maps to make each variable (even those that are not leaves of the dag, A, B and D) an output of the overall diagram. This corresponds to a variable being *observed*. We can also consider Bayesian networks with *latent* variables, which do not appear in the joint distribution due to marginalisation. Continuing the example above, making A into a latent variable yields the following depiction as a string diagram:

In general, a Bayesian network (with possible latent variables), is a string diagram in Stoch that (1) only has outputs and (2) consists only of copy maps and boxes which each have exactly one output.


By 'a string diagram in Stoch', we mean not only the stochastic matrix itself, but also its decomposition into components. We can formalise exactly what we mean by taking a perspective on Bayesian networks which draws inspiration from Lawvere's functorial semantics of algebraic theories [16]. In this perspective, which elaborates on [7, Ch. 4], we maintain a conceptual distinction between the purely syntactic object (the diagram) and its probabilistic interpretation.

Starting from a dag <sup>G</sup> = (V<sup>G</sup>, E<sup>G</sup>), we construct a free CDU category Syn*<sup>G</sup>* which provides the syntax of causal structures labelled by G. The objects of Syn*<sup>G</sup>* are generated by the vertices of <sup>G</sup>, whereas the morphisms are generated by the following signature:

$$\Sigma\_G = \left\{ \begin{array}{c} \frac{\Box^A}{a} \\ \frac{\Box^A}{B\_1 \cdots \Box\_{B\_k}} \end{array} \; \middle| \; A \in V\_G \text{ with parents } B\_1, \dots, B\_k \in V\_G \right\} $$

Then Syn*<sup>G</sup>* := FreeCDU(V<sup>G</sup>, Σ<sup>G</sup>).<sup>1</sup> The following result establishes that models (*`a la* Lawvere) of Syn*<sup>G</sup>* coincide with <sup>G</sup>-based Bayesian networks.

**Proposition 3.1.** *There is a 1-1 correspondence between Bayesian networks based on the dag* <sup>G</sup> *and CDU functors of type* Syn*<sup>G</sup>* <sup>→</sup> Stoch*.*

We refer to [12] for a proof. This proposition justifies the following definition of a category BN*<sup>G</sup>* of <sup>G</sup>-based Bayesian networks: objects are CDU functors Syn*<sup>G</sup>* → Stoch and arrows are monoidal natural transformations between them.

#### **4 Towards Causal Inference: The Smoking Scenario**

We will motivate our approach to causal inference via a classic example, inspired by the one given in the Pearl's book [20]. Imagine a dispute between a scientist and a tobacco company. The scientist claims that smoking causes cancer. As a source of evidence, the scientist cites a joint probability distribution ω over variables S for smoking and C for cancer, which disintegrates as in (5) below,

<sup>1</sup> Note that <sup>E</sup>*<sup>G</sup>* is implicitly used in the construction of Syn*G*: the edges of <sup>G</sup> determine the parents of a vertex, and hence the input types of the symbols in Σ*<sup>G</sup>*.

with matrix *c* = ( <sup>0</sup>.9 0.<sup>7</sup> <sup>0</sup>.1 0.<sup>3</sup> ). Inspecting this *<sup>c</sup>* : <sup>S</sup> <sup>→</sup> <sup>C</sup>, the scientist notes that the probability of getting cancer for smokers (0.3) is three times as high as for nonsmokers (0.1). Hence, the scientist claims that smoking has a significant causal effect on cancer.

An important thing to stress here is that the scientist draws this conclusion using not only the observational data *ω* but also from an assumed *causal structure* which gave rise to that data, as captured in the diagram in Eq. (5). That is, rather than treating diagram (5) simply as a calculation on the observational data, it can also be

treated as an assumption about the actual, physical mechanism that gave rise to that data. Namely, this diagram encompasses the assumption that there is some prior propensity for people to smoke captured by *<sup>s</sup>* : I <sup>→</sup> S, which is both observed and fed into some other process *<sup>c</sup>* : S <sup>→</sup> C whereby an individuals choice to smoke determines whether or not they get cancer.

The tobacco company, in turn, says that the scientists' assumptions about the provenance of this data are too strong. While they concede that *in principle* it is possible for smoking to have some influence on cancer, the scientist should allow for the possibility that there is some latent common cause (e.g. genetic conditions, stressful work environment, etc.) which leads people both

to smoke and get cancer. Hence, says the tobacco company, a 'more honest' causal structure to ascribe to the data ω is (6). This structure then allows for either party to be correct. If the scientist is right, the output of *<sup>c</sup>* : S <sup>⊗</sup> H <sup>→</sup> C depends mostly on its first input, i.e. the causal path from smoking to cancer. If the tabacco company is right, then *c* depends very little on its first input, and the correlation between S and C can be explained almost entirely from the hidden common cause.

So, who is right after all? Just from the observed distribution *ω*, it is impossible to tell. So, the scientist proposes a clinical trial, in which patients are randomly required to smoke or not to smoke. We can model this situation by replacing *s* in (6) with a process that ignores its inputs and outputs the uniform state. Graphically, this looks like 'cutting' the link *<sup>s</sup>* between H and S:

This captures the fact that variable S is now randomised and no longer dependent on any background factors. This new distribution *ω* represents the data

$$\underbrace{^{|S\\_|C}}\_{\boxed{\begin{array}{c} \boxed{\begin{array}{c} \\ \omega \end{array} \end{array}}} = \underbrace{\begin{array}{c} ^{|S\\_|C} \\ \hline \boxed{\begin{array}{c} \\ \hline \end{array} \end{array}}}\_{\boxed{\begin{array}{c} \\ \hline \end{array}}} \underbrace{^{|C|} }\_{\begin{array}{c} \\ \hline \end{array}} \qquad (6)$$

the scientist would have obtained had they run the trial. That is, it gives the results of an *intervention* at *s*. If this *ω still* shows a strong correlation between smoking and cancer, one can conclude that smoking indeed causes cancer even when we assume the weaker causal structure (6).

Unsurprisingly, the scientist fails to get ethical approval to run the trial, and hence has only the observational data *ω* to work with. Given that the scientist only knows *ω* (and not *c* and *h*), there is no way to compute *ω* in this case. However, a key insight of statistical causal inference is that sometimes it *is* possible to compute interventional distributions from observational ones. Continuing the smoking example, suppose the scientist proposes the following revision to the causal structure: they posit a structure (8) that includes a third observed variable (the presence of T of tar in the lungs), which completely mediates the causal effect of smoking on cancer.

As with our simpler structure, the diagram (8) contains some assumptions about the provenance of the data *ω*. In particular, by omitting wires, we are asserting there is no *direct* causal link between certain variables. The absence of an H-labelled input to *<sup>t</sup>* says there is no direct causal link from H to T (only mediated by S), and the absence of an Slabelled input wire into *c* captures that

there is no direct causal link from S to C (only mediated by T). In the traditional approach to causal inference, such relationships are typically captured by a graph-theoretic property called *d-separation* on the dag associated with the causal structure.

We can again imagine intervening at S by replacing *<sup>s</sup>* : H <sup>→</sup> S by ◦ . Again, this 'cutting' of the diagram will result in a new interventional distribution *ω* . However, unike before, it *is* possible to compute this distribution from the observational distribution *ω*.

However, in order to do that, we first need to develop the appropriate categorical framework. In Sect. 5, we will model 'cutting' as a functor. In 6, we will introduce a generalisation of disintegration, which we call *comb disintegration*. These tools will enable us to compute *ω* for *ω*, in Sect. 7.

#### **5 Interventional Distributions as Diagram Surgery**

The goal of this section is to define the 'cut' operation in (7) as an endofunctor on the category of Bayesian networks. First, we observe that such an operation exclusively concerns the string diagram part of a Bayesian network: following the functorial semantics given in Sect. 3, it is thus appropriate to define cut as an endofunctor on Syn*G*, for a given dag <sup>G</sup>.

**Definition 5.1.** *For a fixed node* <sup>A</sup> <sup>∈</sup> <sup>V</sup><sup>G</sup> *in a graph* <sup>G</sup>*, let* cut*<sup>A</sup>* : Syn*<sup>G</sup>* <sup>→</sup> Syn*<sup>G</sup> be the CDU functor freely obtained by the following action on the generators* (VG, ΣG) *of* Syn*G:*

*– For each object* <sup>B</sup> <sup>∈</sup> <sup>V</sup>G*,* cut*A*(B) = B*.*

$$1 - \mathsf{cut}\_A(\varprojlim\_{B\_i \cdots \varprojlim\_{B\_i} \cdots}^{\clubsuit^A}) = \varprojlim\_{B\_i \cdots \varprojlim\_{B\_i} \cdots}^A \text{ and } \mathsf{cut}\_A(\varprojlim\_{C\_i \cdots \varprojlim\_{C\_i} \cdots}^{\clubsuit^B}) = \varprojlim\_{C\_i \cdots \varprojlim\_{C\_i} \cdots}^B \text{ for any } other \xleftarrow{\bigsqcup^B} \in \Sigma\_G.$$

Intuitively, cut*<sup>A</sup>* applied to a string diagram <sup>f</sup> of Syn*<sup>G</sup>* removes from <sup>f</sup> each occurrence of a box with output wire of type A.

Proposition 3.1 allows us to "transport" the cutting operation over to Bayesian networks. Given any Bayesian network based on <sup>G</sup>, let <sup>F</sup> : Syn*<sup>G</sup>* <sup>→</sup> Stoch be the corresponding CDU functor given by Proposition 3.1. Then, we can define its A-cutting as the Bayesian network identified by the CDU functor F ◦ cut*A*. This yields an (idempotent) endofunctor Cut*<sup>A</sup>* : BN*<sup>G</sup>* → BN*G*.

### **6 The Comb Factorisation**

Thanks to the developments of Sect. 5, we can understand the transition from left to right in (7) as the application of the functor Cut*<sup>S</sup>* applied to the 'Smoking' node S. The next step is being able to actually compute the individual Stochmorphisms appearing in (8), to give an answer to the causality question.

In order to do that, we want to work in a setting where *<sup>t</sup>* : S <sup>→</sup> T can be isolated and 'extracted' from (8). What is left behind is a stochastic matrix with a 'hole' where *t* has been

extracted. To define 'morphisms with holes', it is convenient to pass from SMCs to compact closed categories (see e.g. [24]). Stoch is not itself compact closed, but it embeds into Mat(R+), whose morphisms are *all* matrices over positive numbers. Mat(R+) has a (self-dual) compact closed structure; that means, for any set A there is a 'cap' <sup>∩</sup>: A⊗A <sup>→</sup> I and a 'cup' <sup>∪</sup>: I <sup>→</sup> A⊗A, which satisfy the 'yanking' equations on the right. As matrices, caps and cups are defined by <sup>∩</sup>ij <sup>=</sup> <sup>∪</sup>ij <sup>=</sup> <sup>δ</sup><sup>j</sup> <sup>i</sup> . Intuitively, they amount to 'bent' identity wires. Another aspect of Mat(R+) that is useful to recall is the following handy characterisation of the subcategory Stoch.

**Lemma 6.1.** *A morphism <sup>f</sup>* : A <sup>→</sup> B *in* Mat(R+) *is a stochastic matrix (thus a morphism of* Stoch*) if and only if* (3) *holds.*

A suitable notion of 'stochastic map with a hole' is provided by a *comb*. These structures originate in the study of certain kinds of quantum channels [3].

**Definition 6.2.** *<sup>A</sup>* <sup>2</sup>*-comb in* Stoch *is a morphism <sup>f</sup>* : <sup>A</sup><sup>1</sup> <sup>⊗</sup> <sup>A</sup><sup>2</sup> <sup>→</sup> <sup>B</sup><sup>1</sup> <sup>⊗</sup> <sup>B</sup><sup>2</sup> *satisfying, for some other morphism <sup>f</sup>* : <sup>A</sup><sup>1</sup> <sup>→</sup> <sup>B</sup><sup>1</sup>*,*

$$
\begin{array}{c||c}
\frac{B\_1 \parallel \clubsuit\_{B\_2}}{f} &=& \frac{B\_1}{\left\lVert f' \right\rVert} \bullet \\
\hline
A\_1 \parallel & A\_2 & \bullet \\
\end{array}
\tag{9}
$$

= =

This definition extends inductively to n*-combs*, where we require that discarding the rightmost output yields *<sup>f</sup>* <sup>⊗</sup> , for some (<sup>n</sup> <sup>−</sup> 1)-comb *<sup>f</sup>* . However, for our purposes, restricting to 2-combs will suffice.

The intuition behind condition (9) is that the contribution from input <sup>A</sup><sup>2</sup> is only visible via output <sup>B</sup><sup>2</sup>. Thus, if we discard <sup>B</sup><sup>2</sup> we may as well discard <sup>A</sup><sup>2</sup>. In other words, the input/output pair <sup>A</sup><sup>2</sup>, B<sup>2</sup> happen 'after' the pair <sup>A</sup><sup>1</sup>, B<sup>1</sup>. Hence, it is typical to depict 2-combs in the shape of a (hair) comb, with 2 'teeth', as in (10) below:

A1 While combs themselves live in Stoch, Mat(R+) accommodates a second-order reading of the transition in (10): we can treat *f* as a map which expects as input a map *<sup>g</sup>* : <sup>B</sup><sup>1</sup> <sup>→</sup> <sup>A</sup><sup>2</sup> and produces as output a map of type <sup>A</sup><sup>1</sup> <sup>→</sup> <sup>B</sup><sup>2</sup>. Plugging *<sup>g</sup>* : <sup>B</sup><sup>1</sup> <sup>→</sup> <sup>A</sup><sup>2</sup> into the 2-comb can be formally defined in Mat(R+) by composing *f* and *g* in the usual way, then feeding the output of *g* into the second input of *f*, using caps and cups, as in (11).

Importantly, for generic *f* and *g* of Stoch, there is no guarantee that forming the composite (11) in Mat(R+) yields a valid Stoch-morphism, i.e. a morphism satisfying the finality Eq. (3). However, if *f* is a 2-comb and *g* is a Stochmorphism, Eq. (9) enables a discarding map plugged into the output <sup>B</sup><sup>2</sup> in (11) to 'fall through' the right side of *f*, which guarantees that the composed map satisfies the finality equation for discarding. See [12, § ??] for the explicit diagram calculation.

With the concept of 2-combs in hand, we can state our factorisation result.

**Theorem 6.3.** *For any state <sup>ω</sup>* : I <sup>→</sup> A <sup>⊗</sup> B <sup>⊗</sup> C *of* Stoch *with full support, there exists a unique 2-comb <sup>f</sup>* : B <sup>→</sup> A <sup>⊗</sup> C *and stochastic matrix <sup>g</sup>* : A <sup>→</sup> B *such that, in* Mat(R+)*:*

$$
\begin{array}{c}
\begin{array}{c}
\begin{array}{c}
\frac{\cdot^{A}\cdot\;^{B}\;\;\;^{C}}{\cdot\;\cdot}
\end{array}
\end{array} = \begin{array}{c}
\begin{array}{c}
\text{ $A$ }\\
\begin{array}{c}
\begin{array}{c}
\text{ $B$ }\\
\text{ $C$ }\\
\text{ $B$ }\\
\text{ $C$ }
\end{array}
\end{array}
\end{array}
\end{array} = \begin{array}{c}
\begin{array}{c}
\text{ $A$ }\\
\begin{array}{c}
\text{ $B$ }\\
\text{ $C$ }\\
\text{ $D$ }\\
\text{ $C$ }
\end{array}
\end{array}
\end{array}
$$

*Proof.* The construction of *f* and *g* mimics the one of c-factors in [26], using string diagrams and (diagrammatic) disintegration. We first use *ω* to construct maps *<sup>a</sup>* : I <sup>→</sup> A, *<sup>b</sup>* : A <sup>→</sup> B, *<sup>c</sup>* : A <sup>⊗</sup> B <sup>→</sup> C, then construct *<sup>f</sup>* using *<sup>a</sup>* and *<sup>c</sup>* and construct *g* using *b*. For the full proof, including uniqueness, see [12].

Note that Theorem 6.3 generalises the normal disintegration property given in Proposition 2.3. The latter is recovered by taking A := I (or C := I) above.

### **7 Returning to the Smoking Scenario**

We now return to the smoking scenario of Sect. 4. There, we concluded by claiming that the introduction of an intermediate variable T to the observational distribution *<sup>ω</sup>* : I <sup>→</sup> S⊗T <sup>⊗</sup>C would enable us to calculate the interventional distribution. That is, we can calculate *<sup>ω</sup>* <sup>=</sup> <sup>F</sup>(cut*<sup>S</sup>* (ω)) from *<sup>ω</sup>* := <sup>F</sup>(ω). Thanks to Theorem 6.3, we are now able to perform that calculation. We first observe that our assumed causal structure for *ω* fits the form of Theorem 6.3, where *g* is *t* and *f* is a 2-comb containing everything else, as in the diagram on the side.

Hence, *f* and *g* are computable from *ω*. If we plug them back together as in (12), we will get *ω* back. However, if we insert a 'cut' between *f* and *g*:

we obtain *<sup>ω</sup>* <sup>=</sup> <sup>F</sup>(cut*<sup>S</sup>* (ω)).

We now consider a concrete example. Fix interpretations S <sup>=</sup> T <sup>=</sup> C <sup>=</sup> {0, <sup>1</sup>} and let *<sup>ω</sup>* : I <sup>→</sup> S <sup>⊗</sup> T <sup>⊗</sup> C be the stochastic matrix:

$$
\omega := \begin{pmatrix} 0.5 \\ 0.1 \\ 0.01 \\ 0.02 \\ 0.1 \\ 0.1 \\ 0.05 \\ 0.02 \end{pmatrix} \leftarrow \begin{matrix} P(S=0, T=0, C=0) \\ 0, T=0, T=0, C=1) \\ \neg \ P(S=0, T=1, C=0) \\ \neg P(S=0, T=1, C=1) \\ \neg P(S=1, T=0, C=0) \\ \neg P(S=1, T=0, C=1) \\ \neg P(S=1, T=1, C=0) \\ \neg P(S=1, T=1, C=1) \end{matrix}
$$

Now, disintegrating *ω*:

= S T C *ω c s* S C gives *c* ≈ <sup>0</sup>.81 0.<sup>32</sup> <sup>0</sup>.19 0.68

The bottom-left element of *<sup>c</sup>* is P(C = 1|S = 0), whereas the bottom-right is P(C = 1|S = 1), so this suggests that patients are <sup>≈</sup>3.5 times as likely to get cancer if they smoke (68% vs. 19%). However, comb-disintegrating *ω* using Theorem 6.3 gives *<sup>g</sup>* : S <sup>→</sup> T and a comb *<sup>f</sup>* : T <sup>→</sup> S <sup>⊗</sup> C with the following stochastic matrices:

$$\mathbf{f} \approx \begin{pmatrix} 0.53 & 0.21 \\ 0.11 & 0.42 \\ 0.25 & 0.03 \\ 0.12 & 0.34 \end{pmatrix} \qquad \qquad \mathbf{g} \approx \begin{pmatrix} 0.95 & 0.41 \\ 0.05 & 0.59 \end{pmatrix}.$$

Recomposing these with a 'cut' in between, as in the left-hand side of (13), gives the interventional distribution *<sup>ω</sup>* <sup>≈</sup> (0.38, <sup>0</sup>.11, <sup>0</sup>.01, <sup>0</sup>.02, <sup>0</sup>.16, <sup>0</sup>.05, <sup>0</sup>.07, <sup>0</sup>.22). Disintegrating:

From the interventional distribution, we conclude that, in a (hypothotetical) clinical trial, patients are about twice as likely to get cancer if they smoke (54% vs. 25%). So, since 54 < 68, there was *some* confounding influence between S and C in our observational data, but after removing it via comb disintegration, we see there is still a significant causal link between smoking and cancer.

Note this conclusion depends totally on the particular observational data that we picked. For a different interpretation of *ω* in Stoch, one might conclude that there is *no* causal connection, or even that smoking *decreases* the chance of getting cancer. Interestingly, all three cases can arise even when a na¨ıve analysis of the data shows a strong direct correlation between S and C. To see and/or experiment with these cases, we have provided the Python code<sup>2</sup> used to perform these calculations. See also [19] for a pedagocical overview of this example (using traditional Bayesian network language) with some sample calculations.

#### **8 The General Case for a Single Intervention**

While we applied the comb decomposition to a particular example, this technique applies essentially unmodified to many examples where we intervene at a single variable (called X below) within an arbitrary causal structure.

<sup>2</sup> https://gist.github.com/akissinger/aeec1751792a208253bda491ead587b6.

**Theorem 8.1.** *Let* G *be a dag with a fixed node* X *that has corresponding generator* <sup>x</sup>: <sup>Y</sup><sup>1</sup> <sup>⊗</sup> ... <sup>⊗</sup> <sup>Y</sup><sup>n</sup> <sup>→</sup> <sup>X</sup> *in* Syn*G. Then, suppose* <sup>ω</sup> *is a morphism in* Syn*<sup>G</sup> of the following form:*

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \text{ $^{A}$ } \ \text{ $^{X}$ } \end{array} \end{array} \end{array} \rightlef} = \begin{array}{c} \begin{array}{c} \text{ $^{A}$ } \end{array} \end{array} \left( \begin{array}{c} \text{ $^{B}$ } \text{ $^{C}$ } \end{array} \right) \\\\ \begin{array}{c} \begin{array}{c} \text{ $^{B}$ } \text{ $^{C}$ } \end{array} \end{array} \right) \\\\ \begin{array}{c} \begin{array}{c} \text{ $^{B}$ } \text{ $^{C}$ } \end{array} \end{array} \end{array} \tag{14}$$

*for some morphisms* <sup>f</sup>1, f<sup>2</sup> *and* <sup>g</sup> *in* Syn*<sup>G</sup> not containing* <sup>x</sup> *as a subdiagram. Then the interventional distribution <sup>ω</sup>* := <sup>F</sup>(cut*X*(ω)) *is computable from the observational distribution <sup>ω</sup>* <sup>=</sup> <sup>F</sup>(ω)*.*

*Proof.* The proof is very close to the example in the previous section. Interpreting ω into Stoch, we get a diagram of stochastic maps, which we can combdisintegrate, then recompose with ◦ to produce the interventional distribution:

The RHS above is then <sup>F</sup>(cut*X*(ω)).

This is general enough to cover several well-known sufficient conditions from the causality literature, including single-variable versions of the so-called *frontdoor* and *back-door* criteria, as well as the sufficient condition based on confounding paths given by Pearl and Tian [26]. As the latter subsumes the other two, we will say a few words about the relationship between the Pearl/Tian condition and Theorem 8.1. In [26], the authors focus on *semi-Markovian* models, where the only latent variables have exactly two observed children and no parents. Suppose we write A <sup>↔</sup> B if two observed variables are connected by a latent common cause, then one can characterise *confounding paths* as the transitive closure of ↔. They go on to show that the interventional distribution corresponding cutting X is computable whenever there are no confounding paths connecting X to one of its children.

We can compare this to the form of expression ω in Eq. (14). First, note this factorisation implies that all boxes which take X as an input must occur as subdiagrams of g. Hence, any 'confounding path' connecting X to its children would yield at least one (un-copied) wire from <sup>f</sup><sup>1</sup> to <sup>g</sup>, hence it cannot be factored as (14). Conversely, if there are no confounding paths from X to its children, then we can we can place the boxes involved in any other confounding path either entirely inside of g or entirely outside of g and obtain factorisation (14). Hence, restricting to semi-Markovian models, the no- confounding-path condition from [26] is equivalent to ours. However, Theorem 8.1 is slightly more general: its formulation doesn't rely on the causal structure ω being semi-Markovian.

#### **9 Conclusion and Future Work**

This paper takes a fresh, systematic look at the problem of causal identifiability. By clearly distinguishing syntax (string diagram surgery and identification of comb shapes) and semantics (comb-disintegration of joint states) we obtain a clear methodology for computing interventional distributions, and hence causal effects, from observational data.

A natural next step is moving beyond single-variable interventions to the general case, i.e. situations where we allow interventions on multiple variables which may have some arbitrary causal relationships connecting them. This would mean extending the comb factorisation Theorem 6.3 from a 2-comb and a channel to arbitrary n-combs. This seems to be straightforward, via an inductive extension of the proof of Theorem 6.3. A more substantial direction of future work will be the strengthening of Theorem 8.1 from sufficient conditions for causal identifiability to a full characterisation. Indeed, the related condition based on confounding paths from [26] is a necessary and sufficient condition for computing the interventional distribution on a single variable. Hence, it will be interesting to formalise this necessity proof (and more general versions, e.g. [10]) within our framework and investigate, for example, the extent to which it holds beyond the semi-Markovian case.

While we focus exclusively on the case of taking models in Stoch in this paper, the techniques we gave are posed at an abstract level in terms of composition and factorisation. Hence, we are optimistic about their prospects to generalise to other probabilistic (e.g. infinite discrete and continuous variables) and quantum settings. In the latter case, this could provide insights into the emerging field of *quantum causal structures* [6,9,18,22,23], which attempts in part to replay some of the results coming from statistical causal reasoning, but where quantum processes play a role analogous to stochastic ones. A key difficulty in applying our framework to a category of quantum processes, rather than Stoch, is the unavailability of 'copy' morphisms due to the quantum no-cloning theorem [27]. However, a recent proposal for the formulation of 'quantum common causes' [1] suggests a (partially-defined) analogue to the role played by 'copy' in our formulation constructed via multiplication of certain commuting Choi matrices. Hence, it may yet be possible to import results from classical causal reasoning into the quantum case just by changing the category of models.

**Acknowledgements.** FZ acknowledges support from EPSRC grant EP/R020604/1. AK would like to thank Tom Claassen for useful discussions on causal identification criteria.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Higher-Order Distributions for Differential Linear Logic**

Marie Kerjean1(B) and Jean-Simon Pacaud Lemay<sup>2</sup>

<sup>1</sup> Equipe Gallinette, Inria, LS2N, Nantes, France ´ marie.kerjean@inria.fr <sup>2</sup> University of Oxford, Oxford, UK jean-simon.lemay@kellogg.ox.ac.uk

**Abstract.** Linear Logic was introduced as the computational counterpart of the algebraic notion of linearity. Differential Linear Logic refines Linear Logic with a proof-theoretical interpretation of the geometrical process of differentiation. In this article, we construct a polarized model of Differential Linear Logic satisfying computational constraints such as an interpretation for higher-order functions, as well as constraints inherited from physics such as a continuous interpretation for spaces. This extends what was done previously by Kerjean for first order Differential Linear Logic without promotion. Concretely, we follow the previous idea of interpreting the exponential of Differential Linear Logic as a space of higher-order distributions with compact-support, which is constructed as an inductive limit of spaces of distributions on Euclidean spaces. We prove that this exponential is endowed with a co-monadic like structure, with the notable exception that it is functorial only on isomorphisms. Interestingly, as previously argued by Ehrhard, this still allows the interpretation of differential linear logic without promotion.

**Keywords:** Differential Linear Logic · Categorical semantics · Topological vector spaces

# **1 Introduction**

Denotational semantics interprets programs as functions which focuses not on how data from these programs are computed, but rather focusing on the input/output of programs and on data computed from other data [19]. Through the Curry-Howard-Lambek correspondence, this approach refines into the categorical semantics of type systems. In particular, a study of the denotational model of the λ-calculus for coherent spaces led Girard to Linear Logic [9] and the understanding of the use of resources as the computational counterpart of

Marie Kerjean was supported by the ANR Rapido, and would like to thanks Tom Hirschowitz for many comments and discussions on this work. Jean-Simon Pacaud Lemay would like to thank Kellogg College, the Clarendon Fund, and the Oxford-Google DeepMind Graduate Scholarship for financial support.

linearity in algebra. Differential Linear Logic (DiLL) [7] is a refinement of Linear Logic which allows for a notion of linear approximation of non-linear proofs. As a proof-net calculus, DiLL originated from studying vectorial models of Linear Logic which in general are based on spaces of sequences, such as K¨othe spaces and finiteness spaces [5].

Recently the first author argued in [14] that as a sequent calculus DiLL has a "smooth" semantical interpretation where the exponential ! (the central object of Linear Logic) is interpreted as a space of *distributions with compact support* [18]. This semantical interpretation of DiLL (along with the Linear Logic typed phenomena of duality and interaction) provides a strong argument that DiLL should be considered as a foundation for a type theory of differential equations, whose semantics would be based on structures developed for mathematical physics. However one of the many divergences between the theoretical study of physical systems and the theoretical study of programming languages lies in the treatment of input data. In the study of differential equations, one generally only accepts a finite number of parameters: typically time and space [1]. While one of the fundamental aspects of the semantics of functional programming languages is the concept of higher-order types [4], which in particular allows programs to take other programs as inputs. Linking these two concepts together requires that when mathematical physics studies functions with finite dimensional domains, the denotational semantical counterpart will be studying functions whose codomains are spaces of functions (which are in general far from being finite dimensional).

This article gives a higher-order notion of distributions with compact support, following the model without higher order constructed by the first author in [14]. Indeed, only functions whose domains are finite dimensional were defined in [14], while no interpretation was given for functions whose domains are spaces of smooth functions. This latter notion relies on the basic intuition that even with a continuous and infinite set of input data, a program will at each computation use only a *finite* amount of data.

*Content and Related Work.* In this paper, we interpret the exponential as an inductive limit of spaces of distributions with compact support (Definition 7). Non-linear proofs are thus interpreted as elements of a projective limit of spaces of smooth functions. In [3], Blute, Cockett, and Seely construct a general interpretation of an exponential as a *projective* limit of more basic spaces. In [13], Kriegl and Michor construct the free C<sup>∞</sup>-ring over a set X (thus a space of smooth functions) as a projective limit of spaces of smooth functions between Euclidean spaces. Our work thus differs on the fact that we reverse the use of projective and inductive limits for defining the exponential and that we use a finer indexation than the indexation used in [3,13]. The reverse use of limits compared to the literature is motivated by the fact that we are cautious about *polarities* [16], while the finer indexing is for topological considerations. Indeed, we need to carefully consider the functoriality of the exponential and the topology on the objects.

*Context.* Differential Linear Logic (DiLL) is a sequent calculus enriching Linear Logic (LL) with the possibility of *linearizing* proofs. This linearization is semantically understood as the differentiation at 0. Motivated by the need to explore the similarities between the differential structures inherited from logic and those inherited from physics, one would like to interpret formulas of DiLL by general topological vector spaces and non-linear proofs by smooth functions. The interpretation of the involutive linear negation of DiLL leads to the requirement of *reflexive* topological vector spaces, that is, topological vector spaces <sup>E</sup> such that <sup>L</sup>(L(E, <sup>R</sup>), <sup>R</sup>) - E, otherwise expressed as E - E. In [14], the first author argued that in a classical smooth-linear setting, the exponential ! should be interpreted as a space of distributions with compact support [18], that is, !<sup>E</sup> := <sup>C</sup><sup>∞</sup>(E, <sup>R</sup>) . The first author also showed that this defines a strong monoidal functor ! from the category of Euclidean vector spaces to the category of reflexive locally convex and Hausdorff vector spaces. As reflexive spaces typically do not form a ∗-autonomous category (or even a monoidal closed category), in [14] the first author constructs a *polarized* model of DiLL structured as chirality [17]. This polarized structure is also necessary here. In Sect. 5, formulas of DiLL<sup>0</sup> are interpreted in two different categories, depending on whether they interpret a positive or a negative formula.

*Main Content.* In this paper we construct an interpretation for the exponential ! (Definition 10) which is strong monoidal (Theorem 3). The exponential constructed in this paper is a generalization of the compact-support exponential from [14]. Explicitly, for a reflexive space E, the exponential !E is defined as the inductive limit of spaces <sup>C</sup><sup>∞</sup>(R<sup>n</sup>, <sup>R</sup>) , indexed by *linear continuous* functions f : R<sup>n</sup> -E (Definition 7),

$$!E := \varinjlim\_{f : \mathbb{R}^n \to \!\!\!R} \mathcal{C}^{\infty}(\mathbb{R}^n, \mathbb{R})' .$$

We also consider the "why not" connective ? (Definition 9) where for a reflexive space E, ?E is interpreted as the space of smooth scalar functions on E, <sup>C</sup><sup>∞</sup>(E, <sup>R</sup>). Explicitly, being the dual of !E, ?<sup>E</sup> is the projective limit of spaces <sup>C</sup><sup>∞</sup>(R<sup>n</sup>, <sup>R</sup>), indexed by the injective linear continuous functions <sup>f</sup> : <sup>R</sup><sup>n</sup> - E (Proposition 4),

$$?E := \varprojlim\_{f:\mathbb{R}^n \to E'} \mathcal{C}^\infty(\mathbb{R}^n, \mathbb{R})\,.$$

An important drawback of this work is that the functoriality of ! is ensured only on isomorphisms, that is, ! is an endofunctor on the category Refliso of reflexive spaces and isomorphisms between them. We use a technique developed by Ehrhard in [6] to show that this still provides a model of *finitary* Differential linear logic (DiLL0), that is, DiLL without the promotion rule. We also discuss how this construction also leads to a polarized model of DiLL<sup>0</sup> (Sect. 5).

*Organization of the Paper.* Section 2 gives an overview of the development in DiLL which led to this paper and gives some background in functional analysis. In Sect. 3 we discuss higher-order functions and distributions, and prove strong monoidality. Section 4 provides the interpretation of the dereliction and codereliction and the bialgebraic structure of the exponential. Finally in Sect. 5 we discuss the polarized interpretation of formulas.

*Notation.* In this article, we borrow notation from Linear Logic. In particular, we use to distinguish between linear functions and non-linear ones, for example, f : E - F would be *linear continuous* while g : E -F would only be smooth. We also denote elements of !E and ?E, which are index by linear continuous *injective* indexes <sup>f</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>E</sup>, in bold with their indexing in subscript: **g**<sup>f</sup> ∈ !E or **f**<sup>f</sup> ∈ ?E.

### **2 Preliminaries**

#### **2.1 Differential Linear Logic and Its Semantics**

Linear Logic [9] refines Intuitionistic Logic with a linear negation, (−)⊥, and a notion of linearity of proofs, -. More precisely, Linear Logic introduces the fundamental isomorphism between A ⇒ B, proofs of B from A, and !A - B, linear proofs of B from !A the exponential of A. In particular, Linear Logic features a *dereliction* rule d, which allows one to consider linear proofs as particular cases of non-linear proofs:

$$\frac{A^\perp \vdash F}{!(A^\perp) \vdash F} \, d$$

Differential Linear Logic (DiLL) brings a notion differentiation to the picture by introducing a *codereliction* rule ¯ d. By cut-elimination, the codereliction rule allows one to *linearize* a non-linear proof:

$$\frac{\vdash F,A}{\vdash F,!A} \; \bar{d}$$

In Linear Logic, the exponential group also features weakening and contraction rules. While DiLL adds co-weakening and co-contraction rules, which in the context of this paper correspond respectively as integration and convolution (see [15] for more details). DiLL without promotion, or finitary Differential Linear Logic, is denoted DiLL<sup>0</sup> and is the original version of Differential Linear Logic by Ehrhard and Regnier [7]. Its exponential rules for {?, !} can be found in Fig. 1. The other rules of DiLL<sup>0</sup> correspond to the usual ones for the MALL group {⊗, `, ⊕, ×}. Non-finitary DiLL can be constructed by adding the promotion rule to DiLL0, which in particular requires functoriality of the exponential. Cut-elimination in DiLL and DiLL<sup>0</sup> generates *sums of proofs* [7], and therefore the categorical interpretation of proofs must be done in a category enriched over commutative monoids.

$$\begin{array}{ccc} \begin{array}{c} \vdash \Gamma \\ \vdash \Gamma, ?E \end{array} w & \begin{array}{c} \vdash \Gamma, ?E, ?E \\ \vdash \Gamma, ?E \end{array} c & \begin{array}{c} \vdash \Gamma, \to \\ \vdash \Gamma, ?E \end{array} d \\\\ \begin{array}{c} \vdash \Gamma, ?E \\ \vdash \Gamma, \Delta, !E \end{array} \begin{array}{c} \vdash \Gamma, ?E \\ \vdash \Gamma, ?E \end{array} \begin{array}{c} \vdash \Gamma, ?E \\ \vdash \Gamma, ?E \end{array} d \end{array} \end{array}$$

**Fig. 1.** Exponential rules of DiLL<sup>0</sup>

Following Fiore's definition in [8], a categorical model of DiLL is an extension of Seely's axiomatization of categorical models of Linear Logic [20]. Explicitly a model of DiLL consists of a ∗-autonomous category (L, ⊗, 1,( )∗) with a finite biproduct structure × with zero object 0, a strong monoidal comonad !:(L, ×, 0) -(L, <sup>⊗</sup>, 1), and a natural transformation ¯ d : id<sup>L</sup> ⇒ !, called the *codereliction* operator, which interprets differentiation at zero. A particular important coherence for the codereliction is that composing the co-unit of the co-monad <sup>d</sup> : ! <sup>⇒</sup> id<sup>L</sup> with ¯ d results in the identity (the top left triangle of Definition 1). Intuitively, this means that differentiating a linear map results in the same linear map.

*Working Without Promotion.* The special particularity of our work is that we *do not* interpret promotion and thus only obtain a denotational model of DiLL<sup>0</sup> but not of DiLL. The main reason for this is that in the formula

$$\mathcal{S}'(E) := \varinjlim\_{f:\mathbb{R}^n \to E} \mathcal{S}'\_f(\mathbb{R}^n),$$

injectivity of the indexes f : R<sup>n</sup> - E is needed to have a well-defined order to properly define an inductive limit (Definition 6). Therefore the exponential constructed in this paper cannot be functorial with respect to every linear continuous morphism in TopVec. In the construction of the exponential, one needs to compose injective indexes f with maps of the category (resp. their dual ), and these composition ◦ f (resp. ◦ g) are required to again be injective. As shown by Treves [21, Chapter 23.2], is injective if and only if has a dense image. Therefore we have no choice but to ask for isomorphisms and thus we obtain an endofunctor on Refliso, the category of reflexive spaces and linear continuous isomorphisms between them.

Models of DiLL<sup>0</sup> in which promotion is not necessarily interpreted were studied by Ehrhard in his survey on Differential Linear Logic [6]. He introduces *exponential structures* which provides a categorical setting which differs from the traditional axiomatization of Seely's models.

**Definition 1** *[6, Section 2.5]***.** *Let* L *be pre-additive* ∗*-autonomous category (i.e. a commutative monoid enriched* ∗*-autonomous category [6, Sect. 2.4]) and let* L*iso be the wide subcategory of* L *with only isomorphisms as morphisms. An* *exponential structure on a* <sup>L</sup> *is as tuple* (!, w, c, w, ¯ c, d, ¯ ¯ d) *consisting of an endofunctor* ! : L*iso* - L*iso, and families of morphisms of* L *(not necessarily of* L*iso) indexed by the objects of* L*:*

$$w\_A: !A \longrightarrow 1 \qquad c\_A: !A \longrightarrow !A \otimes !A \qquad \bar{w}\_A: 1 \longrightarrow !A \qquad \bar{c}\_A: !A \otimes !A \longrightarrow !A$$

d<sup>A</sup> : !A - A ¯ d<sup>A</sup> : A -!A

*which are natural for morphisms of* L*iso, and such that for each object* A*,* (!A, wA, cA, w¯A, c¯A) *is a commutative bialgebra in* L*, and that the following diagrams to commute:*

The above commutative diagrams allow for a direct interpretation of the cutelimination process of DiLL0. Ehrhard shows in particular that the interpretation of the structural and co-structural rules of DiLL<sup>0</sup> only needs the functoriality of the exponential on the isomorphisms [6, Sect. 2.5]. Indeed, in a *classical* model of DiLL (that is a model in which the interpretation of the linear negation is involutive) functoriality on isomorphisms is needed to guaranty the duality between ? and !. Otherwise, the structural exponential rules are interpreted by natural transformations c, ¯c, w, ¯w, d, and ¯ d. These natural transformations can be constructed as in [8], following a co-monadic structure (!A, wA, μA) on each object !A [7, Sect. 2.6]. To sum up:

#### *Functorality of the exponential on isomorphisms is needed for duality but is not needed to interpret finitary proofs as morphisms of a category*.

That we have a model of DiLL<sup>0</sup> and not of DiLL fits well with our motivation, as we are looking for the computational counterpart of type theories modeled by analysis. DiLL<sup>0</sup> is indeed the sequent calculus which is refined into an understanding of Linear Partial Differential Equations in [14] and the meaning of promotion with respect to differential equations remains unclear. However, we are still able to construct a natural promotion-like morphism for our exponential (Definition 13).

#### **2.2 Reflexive Spaces and Distributions**

In this paper, we study and use the theory of locally convex topological vector spaces [12] to give concrete models of DiLL. Topological vector spaces are a generalization of normed spaces or metric spaces, in which continuity is only characterized by a collection of open sets (which may not necessarily come from a metric or a norm). In this section, we highlight some key concepts which hopefully will give the reader a better understanding of the difficulties of constructing models of DiLL using smooth spaces. We refer respectively to [12] or [18] for details on topological vector spaces or distribution theory.

By a locally convex topological vector space (lcs), we mean a *locally convex and Hausdorff topological vector space* on R. Briefly, these are vector space endowed with a topology generated by convex open subsets such that the scalar multiplication and the addition are both continuous. For the rest of the section, we consider E and F two lcs.

**Definition 2.** *Denote* E ∼ F *for a linear isomorphism between* E *and* F *as* <sup>R</sup>*-vector spaces, and* <sup>E</sup> - F *for a linear homeomorphism between* E *and* F *as topological vector spaces.*

**Definition 3.** *Denote* Lb(E,F) *as the lcs of all* linear continuous *functions between* E *and* F*, which is endowed with the* topology of uniform convergence on bounded subsets *[12] of* <sup>E</sup>*. When* <sup>F</sup> <sup>=</sup> <sup>R</sup>*, we denote* <sup>E</sup> <sup>=</sup> <sup>L</sup>b(E, <sup>R</sup>) *and is called the strong dual of* E*.*

**Definition 4.** *Let* δ : E -E *be the transpose of the evaluation map in* E *, which is explicitly defined as follows:*

$$\delta: \begin{cases} E \longrightarrow E'\\ x \longleftrightarrow \delta\_x: (f \longrightarrow f(x)) \end{cases}$$

*A lcs* E *is said to be* **semi-reflexive** *if* δ *is a linear isomorphism, that is,* E ∼ E*. A semi-reflexive lcs* E *is reflexive when* δ *is a linear homeomorphism, that is,* E -E*.*

The following proposition is crucial to the constructions of this paper. In terms of polarization, it shows how semi-reflexivity is a negative construction, while reflexivity mixes positives and negative requirements.

#### **Proposition 1** *[12, Chapter 11.4]***.**


Next we briefly recall a few facts about distributions.

**Definition 5.** *For each* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*, a function* <sup>f</sup> : <sup>R</sup><sup>n</sup> -R *is said to be smooth if it is infinitely differentiable. Let* <sup>E</sup>(Rn) = <sup>C</sup>∞(Rn, <sup>R</sup>) *denote the space of all smooth functions* f : R<sup>n</sup> - R*, and which is endowed with the topology of uniform convergence of all differentials on all compact subsets of* R<sup>n</sup> *[12]. The strong dual of* <sup>E</sup>(Rn)*,* <sup>E</sup> (Rn)*, is called the space of distributions with compact support.*

We now recall the famous Schwartz kernel theorem, which states that the construction of a kernel of <sup>f</sup> <sup>⊗</sup> <sup>g</sup> <sup>∈</sup> *<sup>E</sup>* (Rn) <sup>⊗</sup> *<sup>E</sup>* (Rm) <sup>→</sup> <sup>f</sup> · <sup>g</sup> <sup>∈</sup> *<sup>E</sup>* (Rn+m) is in fact an isomorphism on the completed tensor product *<sup>E</sup>* (R<sup>n</sup>)⊗ˆ*<sup>E</sup>* (R<sup>m</sup>):

**Theorem 1 (**[18]**).** *For any* n, m <sup>∈</sup> <sup>N</sup>*, we have the following:*

E (R<sup>m</sup>)⊗<sup>ˆ</sup> <sup>π</sup>E (R<sup>m</sup>) - E (R<sup>n</sup>+<sup>m</sup>) - Lb(E (R<sup>m</sup>), <sup>E</sup>(R<sup>n</sup>))

**Theorem 2 (**[14]**).** *There is a first-order polarized denotational model of* DiLL<sup>0</sup> *in which the exponential is interpreted as a space of distributions:* !(R<sup>n</sup>) := *E* (R<sup>n</sup>)*.*

This interpretation did not generalize to higher-order as we were unable to define !E for an infinite dimensional space E, even for those sharing the topological properties of spaces of smooth functions<sup>1</sup>. For example, the definition of !!R is in no way obvious. This is the problem we tackle in the following sections.

#### **3 Higher-Order Distributions and Kernel**

In this section we define spaces of higher-order functions and distributions, we prove that they are reflexive (Proposition 2) and verify a kernel theorem (Theorem 3).

**Definition 6.** *Let* <sup>E</sup> *be a lcs and* <sup>f</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>E</sup> *and* <sup>g</sup> : <sup>R</sup><sup>m</sup> <sup>→</sup> <sup>E</sup> *be two linear continuous injective functions. We say that* f g *when* n m *and* f = g|R*<sup>n</sup> , that is,* <sup>f</sup> <sup>=</sup> <sup>g</sup> ◦ <sup>ι</sup>n,m *where* <sup>ι</sup>n,m : <sup>R</sup><sup>n</sup> - R<sup>m</sup> *is the canonical injection.*

The ordering in the above definition provides an order on the set of dependent pairs (n, f) where <sup>n</sup> <sup>∈</sup> <sup>N</sup> and <sup>f</sup> : <sup>R</sup><sup>N</sup> <sup>→</sup> <sup>E</sup> is linear injective. This will allow us to construct an inductive limit (a categorical colimit) of lcs.

**Definition 7.** *Let* E *any lcs.*

*1. For every linear continuous injective function* f : R<sup>n</sup> - E*, define the* lcs *E* <sup>f</sup> (R<sup>n</sup>) *as follows:*

$$\mathcal{E}\_f'(\mathbb{R}^n) := \mathcal{E}^\infty(\mathbb{R}^n)'$$

<sup>1</sup> These spaces are in particular nuclear (F)-spaces, see [14].

*2. Define E* (E)*, the space of distributions on* E*, as follows:*

$$\mathcal{S}'(E) := \varinjlim\_{f:\mathbb{R}^n \to \!\! - \to} \mathcal{S}'\_f(\mathbb{R}^n)$$

*that is, the inductive limit [12, Chapter4.5] (or colimit) in the category* TopVec *of the family of* lcs {*E* <sup>f</sup> (Rn)|<sup>f</sup> : <sup>R</sup><sup>n</sup> - E linear continuous injective} *directed under the inclusion maps defined as*

$$S\_{f,g}: \mathcal{S}'\_g(\mathbb{R}^n) \longrightarrow \mathcal{S}'\_f(\mathbb{R}^m), \phi \longmapsto (h \longmapsto \phi(h \circ \iota\_{n,m})),$$

*when* f g*.*

Intuitively this definition of *E* (E) says that distributions with compact support on E are the distributions with a finite dimensional compact support <sup>K</sup> <sup>⊂</sup> <sup>R</sup><sup>n</sup>.

**Proposition 2.** *For any lcs* E*, E* (E) *is a reflexive lcs.*

The following proposition justifies the notation of *E* (R<sup>n</sup>) from Definition 5.

**Proposition 3.** *If* E - <sup>R</sup><sup>n</sup> *for some* <sup>n</sup> <sup>∈</sup> <sup>N</sup>*, then <sup>E</sup>* (E) -<sup>C</sup><sup>∞</sup>(R<sup>n</sup>) *.*

As *E* (E) is reflexive, we give a special (yet obvious) notation for the strong dual of *E* (E).

**Definition 8.** *For a reflexive lcs* E*, let E* (E) *denote the strong dual of E* (E)*.*

Since the strong dual of a reflexive lcs is again reflexive [12], it follows by Proposition 3 that for any reflexive lcs E, *E* (E) is also reflexive.

The strong dual of a projective limit is *linearly isomorphic* to the inductive limit of the duals, however as noted in [12, Chapter 8.8.12], the topologies may not coincide. When E is endowed with its Mackey topology (which is the case in particular when E is reflexive), then the topologies do coincide.

**Proposition 4.** *Let* E *be a reflexive lcs. For every linear continuous injective function* f : R<sup>n</sup> - <sup>E</sup>*, define the lcs <sup>E</sup>*<sup>f</sup> (R<sup>n</sup>) := <sup>C</sup><sup>∞</sup>(R<sup>n</sup>)*. Then we have the following linear homeomorphism:*

$$\mathcal{S}(E) \simeq \varprojlim\_{f:\mathbb{R}^n \to E} \mathcal{S}\_f(\mathbb{R}^n)$$

*where the lcs on the right is the projective limit [12, Chapter 2.6] in* TopVec *of the family of lcs* {*E*<sup>f</sup> (R<sup>n</sup>)<sup>|</sup> <sup>f</sup> : <sup>R</sup><sup>n</sup> - E *linear continuous injective*} *with projections defined as:*

$$T\_{g,f} = S'\_{f,g} : \mathcal{C}\_g(\mathbb{R}^m) \longrightarrow \mathcal{C}\_f(\mathbb{R}^n), \mathfrak{g} \longmapsto \mathfrak{g} \circ \iota\_{n,m}.$$

*when* f g*.*

The elements of **f** ∈ *E* (E) are families **f** := (**f**<sup>f</sup> )f:R*n*-<sup>E</sup> such that if f g, we have that **f**<sup>f</sup> = **f**<sup>g</sup> ◦ ιn,m. The intuition here is that distributions of a reflexive lcs E are in fact distributions with compact support on a finite dimensional space, or equivalently that smooth functions E - R are functions which are smooth when restricted to R<sup>n</sup> (viewed as a finite dimensional subspace of E). This makes it possible to define multinomials on E in the following way:

$$P(x \in \mathbb{R}^k) = \sum\_{I \subset [|1,n|]} a\_{\alpha} x\_1^{\alpha^1} \dots x\_n^{\alpha^I\_n}$$

where we either embedded or projected R<sup>k</sup> into R<sup>n</sup> in the canonical way.

It also seems possible to provide a setting restricted specifically to higher order spaces of distributions and not to every reflexive space. Indeed, we would like to describe smooth scalar functions on *E* (R<sup>n</sup>) as

$$h \in \mathcal{S}(\mathbb{R}^n) \mapsto h(0)^2$$

taking into account that we have as inputs non-linear functions. This seem to indicate another direction of research, where we would construct smooth functions indexed by Dirac functions δ : R<sup>n</sup> - E = *E* (R<sup>n</sup>) as defined in Definition 4.

*The Kernel Theorem.* We now provide the Kernel theorem for spaces *E* (E). Indeed, the spaces of functions are the one which can be described as projective limits, and projective limits are the ones which commute with the completed projective tensor product <sup>⊗</sup><sup>ˆ</sup> <sup>π</sup>. While we do not provide a proof here, we would like to highlight that the proof of this theorem depends heavily on the fact that the considered spaces of functions are nuclear spaces [12].

**Theorem 3.** *For every lcs* E *and* F*, we have a linear homeomorphism:*

*<sup>E</sup>* (E)⊗<sup>ˆ</sup> <sup>π</sup>*<sup>E</sup>* (F) -*E* (E ⊕ F).

We now give the definitions of functors ? and !, both of which agree with the previous characterization described by the first author in [14] on Euclidean spaces R<sup>n</sup>. However, as discussed in the introduction, while these functors can be defined properly on all objects, they will only be defined on isomorphisms. So let Refliso denote the category of reflexive lcs and linear homeomorphism between them.

**Definition 9.** *Define the endofunctor* ? : Refliso - Refliso *as follows:*

$$\begin{aligned} ? : \begin{cases} & \text{REFL}\_{iso} \xrightarrow{} \text{REFL}\_{iso} \\ & E \mapsto \mathcal{E}(E') \end{cases} \end{aligned} \tag{1}$$
 
$$\begin{aligned} \ell : E &\longrightarrow F \longmapsto ? \ell : \mathcal{E}(E') \longrightarrow \mathcal{E}(F') \end{aligned} \tag{1}$$

*where for f* ∈ *E* (E )*, the* g : R<sup>m</sup> - F *component of* ?(*f*) ∈ *E* (F ) *is defined as:*

$$?\ell(\mathfrak{f})\_g = f\_{\ell'\circ g}$$

*where* : F -E *denotes the transpose of .*

Note that ? : *E* (E ) -*E* (F ) is defined by the universal property of the projective limit, that is, ? is uniquely defined by post-composing by the projections π<sup>g</sup> : *E* (F ) - *E* (Rn) for each linear continuous injective function g :- F . We also note that **f**-◦<sup>g</sup> is well-defined since is injective and therefore so is ◦g. The universality of the projective limit also insures that ? is an isomorphism and that ? is functorial.

**Definition 10.** *Define the functor* ! : Refliso - Refliso *on objects as* !E := (?E ) *and on isomorphisms as* ! = (? ) *. Explicitly,* ! *is defined as follows:*

$$\begin{aligned} \; ! : \begin{cases} & \text{REFL}\_{iso} \xrightarrow{} \text{REFL}\_{iso} \\ & E \mapsto \mathcal{E}'(E) \\ & \ell : E \xrightarrow{} F \mapsto ! \ell \in \mathcal{E}(F') \end{cases} \end{aligned} \tag{2}$$

*where for the* f : R<sup>n</sup> - E *component of f* ∈ *E* (E)*,* !(*f*<sup>f</sup> ) ∈ *E* (F) *is defined as:*

$$!\ell(\mathbf{f}\_f) = \mathbf{f}\_{\ell \diamond f : \mathbb{R}^n \to F}$$

As before, ! is defined by the co-universal property of the inductive limit, that is, ! is defined by pre-composition with the injections ι<sup>f</sup> : *E* <sup>f</sup> (R<sup>n</sup>) <sup>→</sup> *<sup>E</sup>* (E) for every linear continuous injective function f : R - E. Functoriality of ! is ensured by functoriality of ? and reflexivity of the objects.

#### **4 Structural Morphisms on the Exponential**

We consider the exponential from the DiLL model of convenient vector spaces in [2] as a guideline for defining the structural morphisms on !E. In that setting, structural operations can be defined on Dirac operations. For example, the codereliction dconv maps δ<sup>x</sup> to x. Here the mapping δ<sup>x</sup> must be understood as the linear continuous function which maps x ∈ E to (**f**<sup>f</sup> )<sup>f</sup> ∈ *E* (E ) <sup>→</sup> **<sup>f</sup>**(<sup>f</sup> <sup>−</sup><sup>1</sup>(x) ∈ *E* (E), which we show is well defined below.

#### **4.1 Dereliction and Co-dereliction**

**Definition 11.** *For a reflexive lcs* E*, define the following linear continuous morphism:*

$$d\_E: \begin{cases} !(E) \xrightarrow{} E'^\prime \simeq E \\ \phi \mapsto (\ell \in E' \mapsto \phi((\ell \circ f)\_{f:\mathbb{R}^n \to E} \in \mathcal{C}(E)) \end{cases} \tag{3}$$

We stress that d<sup>E</sup> is a map in Refl and not a map in Refliso (though sufficient for Definition 1). The map d<sup>E</sup> is well defined as ◦ f is a linear continuous injective function R<sup>n</sup> - R, and thus is smooth and belongs in particular to *E* (R<sup>n</sup>). Also, as we are working with reflexive spaces, d<sup>E</sup> could have been described equivalently as a map of the following type:

$$\begin{aligned} E &\longrightarrow ? (E) \\ x &\longmapsto (ev\_x \circ f \in \mathcal{L}(\mathbb{R}^n, \mathbb{R}))\_{f: \mathbb{R}^n \to \mathcal{E}'} \end{aligned} \tag{4}$$

**Lemma 1.** *The morphisms* d<sup>E</sup> *are natural with respect to linear homeomorphisms, that is, maps of* Refliso*. Explicitly, if* : E -F ∈ Refliso *then* d<sup>F</sup> ◦ ! = ◦ dE*.*

We now study the interpretation of the codereliction ¯ d. Let D<sup>0</sup> : <sup>C</sup>∞(Rn) -(Rn) denote the operator which maps a function to its differential at 0.

$$D\_0: \begin{cases} \mathcal{C}^\infty(\mathbb{R}^n) \xrightarrow{\rightharpoonup} (\mathbb{R}^n)' \\ \qquad \mathbf{f} \mapsto \left( v \in \mathbb{R}^n \mapsto \lim\_{t \xrightarrow{\rightharpoonup} 0} \frac{f(tx) - f(0)}{t} = \sum\_{i=1}^n \frac{\partial f}{\partial x\_i}(0)v\_i \right) \end{cases}$$

The operator <sup>D</sup><sup>0</sup> is linear in **<sup>f</sup>** ∈ C<sup>∞</sup>(R<sup>n</sup>). It is continuous: the reciprocal image by <sup>D</sup><sup>0</sup> of the polar <sup>B</sup>0,<sup>1</sup> is the set of all functions **<sup>f</sup>** ∈ C<sup>∞</sup>(R<sup>n</sup>) whose partial derivatives of order one have maximal value 1 on the compact {0}. This contains the set {**f**|∀i, <sup>|</sup> ∂f ∂x*<sup>i</sup>* (0)| < 1}, which is open in the topology described in Definition 5.

**Definition 12.** *For a reflexive lcs* E*, define the following linear continuous morphism:*

$$\bar{d}\_E : \begin{cases} E \Longrightarrow ! E \simeq (\mathcal{C}(E))' \\\ x \mapsto (\mathbf{f}\_f \in \mathcal{C}\_f^\infty(\mathbb{R}^n, \mathbb{R}))\_{f : \mathbb{R}^n \to E'} \mapsto D\_0 \mathbf{f}\_f(f^{-1}(x)) \\\ \text{where } f \text{ is injective such that } x \in Im(f). \end{cases} \tag{5}$$

We should explain why the choice of f <sup>−</sup><sup>1</sup>(x) does not matter. Here f <sup>−</sup><sup>1</sup>(x) is the *linear* argument of the differentiation. Indeed suppose that f g, that is, f = g ◦ ιn,m. Thus by definition of the projective limit we have **f**<sup>f</sup> = **f**<sup>g</sup> ◦ ιn,m and:

$$\begin{aligned} D\_0 \mathbf{f}\_f(f^{-1}(x)) &= D\_0(\mathbf{f}\_g \circ \iota\_{n,m})((g \circ \iota\_{n,m})^{-1}(x)) \\ &= D\_0 \mathbf{f}\_g(D\_0 \iota\_{n,m}(\iota\_{n,m}^{-1}(g^{-1}(x)))) \\ &= D\_0 \mathbf{f}\_g(\iota\_{n,m}(\iota\_{n,m}^{-1}(g^{-1}(x)))) \\ &= D\_0 \mathbf{f}\_g(g^{-1}(x))) \end{aligned} \tag{as \ \iota\_{n,m} \text{ is linear}}) $$

As any pair of of linear functions f : R<sup>n</sup> - E and g : R<sup>m</sup> - E is bounded by <sup>f</sup> <sup>×</sup> <sup>g</sup> : <sup>R</sup><sup>n</sup>+<sup>m</sup> - E, we obtain the required uniqueness.

Similar to the dereliction, the codereliction could alternatively have been described as a map of the following type:

$$\begin{aligned} \circledast (E') &\rightharpoonup E'' \simeq E\\ (\mathbf{f}\_f)\_{f: \mathbb{R}^n \to E'} &\mapsto (\ell \in E' \mapsto D\_0 \mathbf{f}\_f(f^{-1}(\ell)) \end{aligned} \tag{6}$$

We again stress that ¯ d<sup>E</sup> is not a map in Refliso. **Lemma 2.** *The morphisms* ¯ d<sup>E</sup> *are natural with respect to linear homeomorphisms, that is, maps of* Refliso*. Explicitly, if* : E -<sup>F</sup> <sup>∈</sup> Refliso *then* ¯ <sup>d</sup><sup>F</sup> ◦ = ! ◦ ¯ dE*.*

Finally, we observe that d<sup>E</sup> and ¯ d<sup>E</sup> satisfy the all-important coherence condition between derelictions and coderelictions.

**Proposition 5.** *For a reflexive lcs* <sup>E</sup>*,* <sup>d</sup><sup>E</sup> ◦ ¯ d<sup>E</sup> = IdE*.*

#### **4.2 (Co-)contraction and (Co-)weakening**

In this section, we define the interpretation of the other exponential rules: weakening w, co-weakening ¯w, contraction c, and co-contraction ¯c, which will be generalized from [14]. We start with weakening and co-weakening, which are fairly straightforward.

$$w: \begin{cases} !E \xrightarrow{\sim} \mathbb{R} \\ \phi \mapsto \sum\_{f} \phi\_{f}(\mathbf{1}) \end{cases} \qquad \bar{w}: \begin{cases} \mathbb{R} \xrightarrow{\sim} !E \\ 1 \mapsto \delta\_{0}: ((\mathbf{f}\_{f})\_{f} \in \mathcal{E}(E) \mapsto \mathbf{f}(0)) \text{ for any } f \end{cases}$$

According to [8], the rules c and ¯c are interpreted respectively via the kernel theorem and pre-composition with the diagonal E -E × E and co-diagonal E × E -E maps of the biproduct. This is however not defined in a context where ! is functorial only on isomorphisms. Thus we give a direct, componentwise interpretation of contraction and co-contraction.

$$c: \begin{cases} !E \longrightarrow !(E \times E) \simeq !E \otimes !E\\ \phi \mapsto (\mathbf{g}\_g)\_{g: \mathbb{R}^n \hookrightarrow E \times E} \longmapsto \phi((\mathbf{g}\_{(x \in \mathbb{R}^n \mapsto (f(x), f(x)))})\_{f: \mathbb{R}^n \hookrightarrow E})) \end{cases}$$

$$\bar{c}: \left\{ \begin{aligned} ^l E \otimes ! E &\longrightarrow ! E\\ \phi \otimes \psi &\longmapsto (\mathbf{f}\_f)\_{f: \mathbb{R}^n \hookrightarrow E} \mapsto \phi \left( (x \in \mathbb{R}^n \mapsto \psi \left( (y \in \mathbb{R}^m \mapsto \mathbf{f}\_f(x) + \mathbf{f}\_{f'}(y) \right)\_{f'} \right) \right) \end{aligned} \right. $$
 
$$\text{where } f: \mathbb{R}^n \hookrightarrow E \text{ and } f': \mathbb{R}^m \hookrightarrow E.$$

**Theorem 4.** *The morphisms* (w, w, c, ¯ c, d, ¯ ¯ d) *satisfy the coherences of exponential structure on* !E*, as detailed in Definition* 1*.*

We note that this does not give an exponential structure per say since Refl is not a monoidal category, as we will explain in Sect. 5. That said, in Sect. 5 we are still able to construct a *polarized* model of DiLL0.

#### **4.3 Co-multiplication**

The categorical interpretation of the exponential rules of linear logic requires a co-monad ! : L -L. However in the case of this paper, the exponential ! is functorial only on isomorphisms. As such, one cannot interpret the promotion rule of Linear Logic, as this requires functoriality of ! on the interpretation of any proof (and typically on linear continuous maps which are not isomorphisms). That said, functoriality is the only missing ingredient, and one can still define natural transformations of the same type as the co-multiplication and co-unit of the co-monad. This section details this point, leaving the exploration of a functorial ! for future work.

**Definition 13.** *For a reflexive lcs* E*, define the following linear continuous morphism:*

$$\mu\_E: \begin{cases} !E \Longrightarrow !!E \\ \phi \mapsto \left( (\mathbf{g}\_g)\_g \in \mathcal{E}(!E) \simeq \varinjlim\_g \mathcal{C}\_g^{\infty}(\mathbb{R}^m) \right) \mapsto \mathbf{g}\_g(g^{-1}(\phi)) \\ \qquad \text{when } \phi \in \varinjlim(g) \text{ and } g \text{ is injective} \end{cases} \tag{7}$$

This is well defined, as we can show as for the codereliction (5) that the term **g**g(g−<sup>1</sup>(φ)) is unique when g : R<sup>m</sup> -!E linear and **g**<sup>g</sup> ∈ C<sup>∞</sup> <sup>g</sup> (R<sup>m</sup>) varies. Moreover there is at least one linear function g : R<sup>m</sup> -!E which has φ in its image.

**Lemma 3.** *The morphisms* μ<sup>E</sup> *are natural with respect to linear homeomorphisms, that is, maps of* Refliso*. Explicitly, if* : E -F ∈ Refliso *then* μ<sup>F</sup> ◦ !! = ! ◦ μE*.*

**Proposition 6.** *For any reflexive lcs* E*,* d!<sup>E</sup> ◦ μ<sup>E</sup> = Id!<sup>E</sup>

The identity of Proposition 6 is one of the identities of a comonad. The other comonad identities require applying ! to μ and d, which we cannot do in our context as ! is only defined on isomorphisms.

### **5 A Model of DiLL<sup>0</sup>**

In Sect. 4 we defined the structural morphisms on the exponential and proved the equations allowing to interpret proofs of DiLL<sup>0</sup> by morphisms in Refl, independent of cut-elimination. We now detail which categories allow to interpret formulas of MALL. This will be done in a polarized setting generalizing the one of [14].

*Polarization.* So far we have constructed an exponential ! : Refliso - Refliso which is strong monoidal. However, the category of reflexive spaces is too big to give us a model of DiLL0. Interpreting the multiplicative connective requires a monoidal setting, and reflexive spaces are not stable by topological tensor products. If we study more closely the definition of spaces of higher-order smooth functions, we see that their reflexivity follows from a more restrictive class of spaces. These spaces are however not stable by duality, thus resulting in a *polarized* model of DiLL0.

In this section we briefly show how the techniques develop above constructs a *polarized model* of DiLL0. The syntax of polarized (Differential) Linear Logic [16] is recalled below. A distinction is made between positive formulas (preserved by ⊗ and ⊕) and negative formulas (preserved by ` and &). The same deduction rule apply.

> Negative Formulas: N,M := ⊥|1||?P|N ` M|N × M|P <sup>⊥</sup> Positive Formulas: P, Q := |0|!N|P ⊗ Q|P ⊕ Q|N <sup>⊥</sup>

Models of polarized linear logic are axiomatized categorically as an *adjunction between a category of positives and a category of negative*, where two interpretations for negation play the role of adjoint functors. These categories obey the axiomatic of chiralities [17].

*Additives.* Interpreting the additive connectives of linear logic is straightforward. The product × and coproduct ⊕ of lcs are linearly homeomorphic on finite indexes and therefore give biproducts, which leads to the usual commutative monoid enrichment as described in [8].

*Multiplicatives.* When sticking to finite dimensional spaces or normed spaces, duality is pretty straightforward in the sense that the dual of a normed space is still normed. This, however, is no longer the case when one generalizes to metric spaces. Indeed, the dual of a metric space may not be endowed with a metric. A Fr´echet space, or (F)-space, is a complete and metrizable lcs. The duals of these spaces are not metrizable in general, but they are (DF)-spaces (see [10] for the definition):

**Proposition 7 (**[11] IV.3.1**).**


Typical examples of nuclear (F)-spaces are the spaces of smooth functions *E* (R<sup>n</sup>), while typical examples of nuclear (DF)-spaces are the spaces of distributions with compact support *E* (R<sup>n</sup>). In particular, all these spaces are reflexive. In [14], the first author interpreted positive formulas as Nuclear (DF)-spaces, while negative formulas were interpreted as (F)-spaces. Following the construction of Sect. 3, we will consider respectively inductive limits and projective limits.

**Definition 14.** *A lcs is said to be a* Lnf*-space if it is a regular projective limit of nuclear Fr´echet spaces. The category of* Lnf*-spaces and linear continuous injective maps is denoted* LNF*. A lcs* E *is said to be a* Lndf*-space if it is an inductive limit of nuclear complete (DF)-spaces.*

**Proposition 8.** *1. A* Lnf*-space* E *is reflexive. 2. The dual of a* Lnf*-space is a* Lndf*-space.*

The above proposition can be proven using the same techniques as computing the dual of *E* (E).

The difficulty of constructing a model of MLL in topological vector spaces is choosing the topology which will make the tensor product associative and commutative on the already chosen category of lcs. Contrary to what happens in a purely algebraic setting, the definition of a topological tensor product is not straightforward and several topologies can be defined, with each corresponding to a different notion of continuity for bilinear maps [10]. On nuclear spaces, such as *E* (Rn) and *E* (Rn), most of these tensor product coincide with one another. In [14], both multiplicative connectors (⊗ and `) were interpreted as the completed projective (equivalently injective) tensor product <sup>⊗</sup><sup>ˆ</sup> <sup>π</sup> (see [12, 15.1 and 21.2]) This property is lost when working with limits. However, there is still a good interpretation of ` for Lnf spaces (which are thus the interpretation of negatives formulas). Indeed, the completed injective tensor product <sup>⊗</sup><sup>ˆ</sup> <sup>ε</sup> of a projective limit of lcs is the projective limit of the completed injective tensor products [12, 16.3.2]. Taking the duals of Theorem 3 applied to E and F gives the following:

**Proposition 9.** *For any reflexive spaces* E *and* F *we have a linear homeomorphism:*

$$?E \hat{\otimes}\_{\varepsilon} ?F \simeq ?(E \oplus F).$$

and shows that ` is interpreted by <sup>⊗</sup><sup>ˆ</sup> <sup>ε</sup>. The multiplicative conjunction <sup>⊗</sup> is interpreted as the dual of <sup>⊗</sup><sup>ˆ</sup> <sup>ε</sup>, which may not be necessarily linearly homeomorphic to <sup>⊗</sup><sup>ˆ</sup> <sup>π</sup>.

#### **6 Conclusion**

In this paper, we extended the polarized model of DiLL without higher order constructed in [14] to a higher-order polarized model of DiLL0. The motivating idea was that computation on spaces of functions used only a finite number of arguments. This lead to constructing an exponential on a reflexive lcs as an inductive limit of exponentials of finite dimensional vector spaces. While this exponential is only functorial for linear homeomorphisms we were still able to provide structural morphisms interpreting (co)weakening, (co)contraction, and (co)dereliction, and hints of a co-monad.

The next step would be to extend the definition of the exponential in this paper to an interpretation of the promotion rule and thus of LL – this could be done through epi-mono decomposition of arrows in Refl. Another task is to properly work out which tensor product of reflexive space will provide a model of DiLL. Such a model should use chiralities [17], and underline the similarities between shifts and (co-)dereliction.

More generally, this works highlights again that the interpretation of the exponential in lcs relies on a computing principle. Indeed, it always requires finding a higher-order extension of distributions. While what we have constructed here relies on a finitary principle, the construction of a free exponential in [3] relies on the principle that higher-order operations are computed on Dirac distributions δx. That is, the exponential is constructed following a discretization scheme. The appearance of such numerical methods in a semantic study of DiLL provides another link between theoretical computer science and mathematical physics. This opens the door to studying relating numerical schemes of numerical analysis and the theoretical study of programming language.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Languages Ordered by the Subword Order**

Dietrich Kuske1(B) and Georg Zetzsche<sup>2</sup>

<sup>1</sup> Technische Universit¨at Ilmenau, Ilmenau, Germany dietrich.kuske@tu-ilmenau.de <sup>2</sup> Max Planck Institute for Software Systems (MPI-SWS), Kaiserslautern, Germany georg@mpi-sws.org

**Abstract.** We consider a language together with the subword relation, the cover relation, and regular predicates. For such structures, we consider the extension of first-order logic by threshold- and modulo-counting quantifiers. Depending on the language, the used predicates, and the fragment of the logic, we determine four new combinations that yield decidable theories. These results extend earlier ones where only the language of all words without the cover relation and fragments of first-order logic were considered.

**Keywords:** Subword order · First-order logic · Counting quantifiers · Decidable theories

#### **1 Introduction**

The subword relation (sometimes called scattered subword relation) is a simple example of a well-quasi ordering [7]. This property allows its prominent use in the verification of infinite-state systems [4]. The subword relation can be understood as embeddability of one word into another. This embeddability relation has been considered for other classes of structures like trees, posets, semilattices, lattices, graphs etc. [8–11,14–16,22,23].

We are interested in logics over the subword order. Prior work on this has concentrated on first-order logic where the universe consists of all words over some alphabet. In this setting, we already have a rather precise picture about the border between decidability and undecidability: For the subword order alone, the ∃<sup>∗</sup>-theory is decidable [17] and the ∃<sup>∗</sup>∀<sup>∗</sup>-theory is undecidable [6,12]. If we add constants to the signature, already the ∃<sup>∗</sup>-theory becomes undecidable [6]. With regular predicates, the two-variable theory is decidable, but the three-variable theory is undecidable [12].

Thus, the decidable theories identified so far leave little room to express natural properties. First, the universe is confined to the set of all words and

Part of the results were obtained when the second author was affiliated with the Laboratoire Sp´ecification et V´erification (ENS Paris-Saclay) and supported by a fellowship within the Postdoc-Program of the German Academic Exchange Service (DAAD) and by Labex DigiCosme, Universit´e Paris-Saclay, project VERICONISS.

c The Author(s) 2019

M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 348–364, 2019. https://doi.org/10.1007/978-3-030-17127-8\_20

predicates for subsets quickly incur undecidability. Moreover, neither in the ∃∗-, nor in the two-variable fragment of first-order logic, one can express the cover relation -· (i.e., "u is a proper subword of v and there is no word properly between these two"). As another example, one cannot express threshold properties like "there are at most k subwords with a given property" in any of these two logics.

In this paper, we aim to identify decidable logics that are more expressive. To that end, we consider four additions to the expressivity of the logic:


Formally, this means we consider structures of the form

(L, , -·,(<sup>K</sup> <sup>∩</sup> <sup>L</sup>)<sup>K</sup> regular,(w)<sup>w</sup>∈<sup>L</sup>),

where the universe is a language <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup>, is the subword ordering, -· is the cover relation, there is a predicate K∩L for each regular K <sup>⊆</sup> Σ<sup>∗</sup>, and a constant symbol for each w <sup>∈</sup> L. Moreover, we consider fragments of the logic C+MOD, which extends first-order logic by threshold- and modulo-counting quantifiers.

The key idea of this paper is to find decidable theories by varying the universe L and thereby either (i) simplify the structure (L, ) enough to obtain decidability even with the extensions above or (ii) generalize existing results that currently only apply to L <sup>=</sup> Σ<sup>∗</sup>. This leads to the following results.


Our first result is shown by a first-order interpretation of the structure in (N, +). Since L <sup>⊆</sup> w<sup>∗</sup> <sup>1</sup> ··· <sup>w</sup><sup>∗</sup> <sup>n</sup>, instead of words, one can argue about vectors (x<sup>1</sup>,...,x<sup>n</sup>) <sup>∈</sup> <sup>N</sup><sup>n</sup> for which w<sup>x</sup><sup>1</sup> <sup>1</sup> ··· <sup>w</sup><sup>x</sup><sup>n</sup> <sup>n</sup> <sup>∈</sup> <sup>L</sup>. For the interpretation, we use the fact that semilinearity of context-free languages yields a Presburger formula expressing wx<sup>1</sup> <sup>1</sup> ··· <sup>w</sup>x<sup>n</sup> <sup>n</sup> <sup>∈</sup> <sup>L</sup> for (x<sup>1</sup>,...,xn) <sup>∈</sup> <sup>N</sup>n. Moreover, Presburger definability of wx<sup>1</sup> <sup>1</sup> ··· <sup>w</sup>x<sup>n</sup> <sup>n</sup> <sup>w</sup>y<sup>1</sup> <sup>1</sup> ··· <sup>w</sup>y<sup>n</sup> <sup>n</sup> for (x<sup>1</sup>,...,xn) <sup>∈</sup> <sup>N</sup><sup>n</sup> and (y<sup>1</sup>,...,yn) <sup>∈</sup> <sup>N</sup><sup>n</sup> is a simple consequence of the subword relation being rational, which was observed in [12]. The first-order interpretation of our structure in (N, +) then enables us to employ decidability of the C+MOD-theory of the latter structure [1,5,21]. (Note that this decidability does not follow directly from Presburger's result since in first-order logic, one cannot make statements like "the number of witnesses x <sup>∈</sup> <sup>N</sup> satisfying . . . is even"). A similar interpretation in (N, +) was used in [6] for various algorithms concerning (Σ<sup>∗</sup>, ,(w)<sup>w</sup>∈Σ<sup>∗</sup> ) for fragments of FO related to bounded languages.

Our second result extends an approach from [12] for decidability of the FO<sup>2</sup> theory of the structure (Σ<sup>∗</sup>, ,(L)<sup>L</sup> regular). The authors of [12] provide a quantifier elimination procedure showing that every unary relation FO<sup>2</sup>-definable in this structure is regular. Our extended quantifier-elimination procedure uses the same invariant, now relying on the following two properties:

– The class of regular languages is closed under *counting* images under *unambiguous* rational relations.

This can be shown either directly or (as we do here) using weighted automata [20].

– The proper subword, the cover, and the incomparability relation are *unambiguous* rational.

Our third result extends the decidability of the Σ<sup>1</sup>-theory of (Σ<sup>∗</sup>, ) from [17]. In [17], decidability is a consequence of the fact that every finite partial order can be embedded into (Σ<sup>∗</sup>, ) if <sup>|</sup>Σ| ≥ 2. This certainly fails for general regular languages: (a<sup>∗</sup>, ) can only accomodate linear orders. However, we can distinguish two cases: If L is a bounded language, then decidability of the Σ<sup>1</sup>-theory of (L, ) follows from our first result. If <sup>L</sup> is not bounded, then we show that again every finite partial order embeds into (L, ). To this end, we first extend a well-known property of unbounded regular languages, namely that there are x, u, v, y <sup>∈</sup> Σ<sup>∗</sup> with <sup>x</sup>{u, v}<sup>∗</sup><sup>y</sup> <sup>⊆</sup> <sup>L</sup> such that <sup>|</sup>u<sup>|</sup> <sup>=</sup> <sup>|</sup>v<sup>|</sup> and <sup>u</sup> <sup>=</sup> <sup>v</sup>. We show that here, u, v can be chosen so that uv is a primitive word. We then observe that for large enough n, any embedding of the word (uv)<sup>n</sup>−<sup>1</sup> into (uv)<sup>n</sup> must hit either the left-most position or the right-most position in (uv)<sup>n</sup>. This enables us to argue that for large enough n, sending a tuple (t<sup>1</sup>,...,t<sup>m</sup>) ∈ {0, <sup>1</sup>}<sup>m</sup> to xv<sup>t</sup><sup>1</sup> (uv)<sup>n</sup> ··· v<sup>t</sup>m(uv)<sup>n</sup>y is in fact an embedding of ({0, <sup>1</sup>}<sup>m</sup>, <sup>≤</sup>) into (L, ), where <sup>≤</sup> denotes coordinate-wise comparison. Since any partial order with <sup>≤</sup> m elements embeds into ({0, <sup>1</sup>}<sup>m</sup>, <sup>≤</sup>), this completes the proof.

Regarding our fourth result, we know from [6] that decidability of the Σ<sup>1</sup>-theory of (L, ,(w)<sup>w</sup>∈<sup>L</sup>) does not hold for every regular L: Undecidability holds already for L <sup>=</sup> {a, b}<sup>∗</sup>. Therefore, we require that every letter is frequent in L, meaning that in some automaton for L, every letter occurs in every cycle. In case L is bounded, we can again invoke our first result. If L is not bounded, we deduce from the frequency condition that for every w <sup>∈</sup> Σ<sup>∗</sup>, there are only finitely many words in L that do not have w as a subword. Removing those finitely many words preserves unboundedness, so that every finite partial order embeds in L above w. We then proceed to show that for such languages, any Σ<sup>1</sup>-sentence is effectively equivalent to a sentence where constants are only used to express that all variables take values above a certain word w. Since every finite partial order embeds above w, this implies decidability.

The full version of this work is available as [18].

### **2 Preliminaries**

Throughout this paper, let <sup>Σ</sup> be some finite alphabet. A word <sup>u</sup> <sup>=</sup> <sup>a</sup>1a<sup>2</sup> ...a<sup>m</sup> with <sup>a</sup>1, a2,...,a<sup>m</sup> <sup>∈</sup> <sup>Σ</sup> is a *subword* of a word <sup>v</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> if there are words <sup>v</sup>0, v1,...,v<sup>m</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> with <sup>v</sup> <sup>=</sup> <sup>v</sup>0a1v1a2v<sup>2</sup> ··· <sup>a</sup><sup>m</sup>v<sup>m</sup>. In that case, we write <sup>u</sup> <sup>v</sup>; if, in addition, u <sup>=</sup> v, then we write u v and call u <sup>a</sup> *proper* subword of v. If u, w <sup>∈</sup> Σ<sup>∗</sup> such that <sup>u</sup> w and there is no word v with u v w, then we say that w is a *cover* of u and write u -· w. This is equivalent to saying u w and <sup>|</sup>u<sup>|</sup> +1= <sup>|</sup>w<sup>|</sup> where <sup>|</sup>u<sup>|</sup> is the length of the word u. If neither u is a subword of v nor *vice versa*, then the words u and v are *incomparable* and we write u v. For instance, aa babbba, aa -· aba, and aba aabb.

Let <sup>S</sup> = (L,(R<sup>i</sup>)<sup>i</sup>∈<sup>I</sup> ,(w<sup>j</sup> )<sup>j</sup>∈<sup>J</sup> ) be a *structure*, i.e., <sup>L</sup> is a set, <sup>R</sup><sup>i</sup> <sup>⊆</sup> <sup>L</sup><sup>n</sup><sup>i</sup> is a relation of arity <sup>n</sup><sup>i</sup> (for all <sup>i</sup> <sup>∈</sup> <sup>I</sup>), and <sup>w</sup><sup>j</sup> <sup>∈</sup> <sup>L</sup> for all <sup>j</sup> <sup>∈</sup> <sup>J</sup>. Then, formulas <sup>ϕ</sup> of the logic C+MOD are defined by the following grammar:

$$\varphi ::= (s = t) \mid R\_i(s\_1, \dots, s\_{n\_i}) \mid \neg \varphi \mid \varphi \lor \varphi \mid \exists x \, \varphi \mid \exists^{\geq k} x \, \varphi \mid \exists^{p \bmod q} x \, \varphi$$

where s, t, s1,...,s<sup>n</sup><sup>i</sup> are variables or constants <sup>w</sup><sup>j</sup> with <sup>j</sup> <sup>∈</sup> <sup>J</sup>, <sup>i</sup> <sup>∈</sup> <sup>I</sup>, <sup>k</sup> <sup>∈</sup> <sup>N</sup>, and p, q <sup>∈</sup> <sup>N</sup> with p<q. We call <sup>∃</sup><sup>≥</sup><sup>k</sup> <sup>a</sup> *threshold counting quantifier* and <sup>∃</sup><sup>p</sup> mod <sup>q</sup> a *modulo counting quantifier*. The semantics of these quantifiers is defined as follows:

– S |<sup>=</sup> <sup>∃</sup><sup>≥</sup><sup>k</sup>x α iff |{<sup>w</sup> <sup>∈</sup> <sup>L</sup> |S|<sup>=</sup> <sup>α</sup>(w)}| ≥ <sup>k</sup> – S |<sup>=</sup> <sup>∃</sup><sup>p</sup> mod <sup>q</sup>x α iff |{<sup>w</sup> <sup>∈</sup> <sup>L</sup> |S|<sup>=</sup> <sup>α</sup>(w)}| ∈ <sup>p</sup> <sup>+</sup> <sup>q</sup><sup>N</sup>

For instance, <sup>∃</sup>0 mod 2x α expresses that the number of elements of the structure satisfying α is even. Then - <sup>∃</sup>0 mod 2x α ∨ - <sup>∃</sup>1 mod 2x α holds iff only finitely many elements of the structure satisfy α. The fragment FO+MOD of C+MOD comprises all formulas not containing any threshold counting quantifier. First-order logic FO is the set of formulas from C+MOD not mentioning any counting quantifier. Let <sup>Σ</sup><sup>1</sup> denote the set of first-order formulas of the form <sup>∃</sup>x<sup>1</sup> <sup>∃</sup>x<sup>2</sup> ... <sup>∃</sup>x<sup>n</sup> : <sup>ψ</sup> where ψ is quantifier-free; these formulas are also called *existential*.

The threshold quantifier <sup>∃</sup><sup>≥</sup><sup>k</sup> can be expressed using the existential quantifier, only. Consequently, the logics FO+MOD and C+MOD are equally expressive. The situation changes when we restrict the number of variables that can be used in a formula: Let FO+MOD<sup>2</sup> and C+MOD<sup>2</sup> denote the set of formulas from FO+MOD and C+MOD, respectively, that use the variables x and y, only. Then, the existence of <sup>≥</sup>3 elements in the structure is expressible in C+MOD<sup>2</sup>, but not in FO+MOD<sup>2</sup>.

In this paper, we will consider the following structures:


For any structure S and any of the logics L, the L*-theory* of S is the set of sentences from L that hold in S.

A non-deterministic finite automaton is called *non-degenerate* if every state lies on a path from an initial to a final state. A language <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> is *bounded* if there are a number <sup>n</sup> <sup>∈</sup> <sup>N</sup> and words <sup>w</sup>1, w2,...,w<sup>n</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> such that <sup>L</sup> <sup>⊆</sup> w∗ <sup>1</sup> <sup>w</sup><sup>∗</sup> <sup>2</sup> ··· <sup>w</sup><sup>∗</sup> <sup>n</sup>. Otherwise, it is *unbounded*.

For a monoid M, a subset S <sup>⊆</sup> M is called *rational* if it is a homomorphic image of a regular language. In other words, there exists an alphabet Δ, a regular R <sup>⊆</sup> Δ<sup>∗</sup>, and a homomorphism <sup>h</sup>: <sup>Δ</sup><sup>∗</sup> <sup>→</sup> <sup>M</sup> with <sup>S</sup> <sup>=</sup> <sup>h</sup>(R). In particular, if <sup>Σ</sup>1, Σ<sup>2</sup> are alphabets and <sup>M</sup> <sup>=</sup> <sup>Σ</sup><sup>∗</sup> <sup>1</sup> <sup>×</sup> <sup>Σ</sup><sup>∗</sup> <sup>2</sup> , then a subset <sup>S</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> <sup>1</sup> <sup>×</sup> <sup>Σ</sup><sup>∗</sup> <sup>2</sup> is rational iff there is an alphabet <sup>Δ</sup>, a regular <sup>R</sup> <sup>⊆</sup> <sup>Δ</sup><sup>∗</sup>, and homomorphisms <sup>h</sup><sup>i</sup> : <sup>Δ</sup><sup>∗</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> i with S <sup>=</sup> {(h<sup>1</sup>(w), h<sup>2</sup>(w)) <sup>|</sup> w <sup>∈</sup> R}. This fact is known as *Nivat's theorem* [2].

For an alphabet <sup>Γ</sup>, a word <sup>w</sup> <sup>∈</sup> <sup>Γ</sup><sup>∗</sup>, and a letter <sup>a</sup> <sup>∈</sup> <sup>Γ</sup>, let <sup>|</sup>w|<sup>a</sup> denote the number of occurrences of the letter a in the word w. The *Parikh vector* of w is the tuple <sup>Ψ</sup><sup>Γ</sup> (w)=(|w|a)<sup>a</sup>∈<sup>Γ</sup> <sup>∈</sup> <sup>N</sup><sup>Γ</sup> . Note that <sup>Ψ</sup><sup>Γ</sup> is a homomorphism from the free monoid Γ<sup>∗</sup> onto the additive monoid (N<sup>Γ</sup> , +).

#### **3 The FO+MOD-Theory with Regular Predicates**

The aim of this section is to prove that the full FO+MOD-theory of the structure

$$(L, \sqsubseteq, \sqsubseteq, (K \cap L)\_K \text{ regular}, (w)\_{w \in L})$$

is decidable for L bounded and context-free. This is achieved by interpreting this structure in (N, +), i.e., in Presburger arithmetic whose FO+MOD-theory is known to be decidable [1,5,21]. We start with three preparatory lemmas.

**Lemma 3.1.** *Let* <sup>K</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> *be context-free,* <sup>w</sup><sup>1</sup>,...,w<sup>n</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup>*, and* <sup>g</sup> : <sup>N</sup><sup>n</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> *be defined by* g(m) = w<sup>m</sup><sup>1</sup> <sup>1</sup> <sup>w</sup><sup>m</sup><sup>2</sup> <sup>2</sup> ··· <sup>w</sup><sup>m</sup><sup>n</sup> <sup>n</sup> *for all* <sup>m</sup> = (m<sup>1</sup>, m<sup>2</sup>,...,m<sup>n</sup>) <sup>∈</sup> <sup>N</sup><sup>n</sup>*. The set* g<sup>−</sup><sup>1</sup>(K) = {<sup>m</sup> <sup>∈</sup> <sup>N</sup><sup>n</sup> <sup>|</sup> <sup>g</sup>(m) <sup>∈</sup> <sup>K</sup>} *is effectively semilinear.*

*Proof.* Let <sup>Γ</sup> <sup>=</sup> {a<sup>1</sup>, a<sup>2</sup>,...,a<sup>n</sup>} be an alphabet and define the monoid homomorphism <sup>f</sup> : <sup>Γ</sup><sup>∗</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> by <sup>f</sup>(a<sup>i</sup>) = <sup>w</sup><sup>i</sup> for all <sup>i</sup> <sup>∈</sup> [1, n].

Since the class of context-free languages is effectively closed under inverse homomorphisms and under intersections with regular languages, the language

$$L = f^{-1}(K) \cap a\_1^\* a\_2^\* \cdots a\_n^\* = \{ u \in a\_1^\* a\_2^\* \cdots a\_n^\* \mid f(u) \in K \}$$

is effectively context-free. Its Parikh image <sup>Ψ</sup><sup>Γ</sup> (L) <sup>⊆</sup> <sup>N</sup><sup>n</sup> is effectively semilinear [19]. Moreover, <sup>Ψ</sup><sup>Γ</sup> (L) equals the set <sup>g</sup>−<sup>1</sup>(K) from the lemma.

**Lemma 3.2.** *Let* <sup>w</sup><sup>1</sup>,...,w<sup>n</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> *and* <sup>g</sup> : <sup>N</sup><sup>n</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> *be defined by* <sup>g</sup>(m) = wm1 <sup>1</sup> <sup>w</sup>m<sup>2</sup> <sup>2</sup> ··· <sup>w</sup>m<sup>n</sup> <sup>n</sup> *for all* <sup>m</sup> = (m<sup>1</sup>, m<sup>2</sup>,...,mn) <sup>∈</sup> <sup>N</sup>n*. The set* {(m, <sup>n</sup>) <sup>∈</sup> <sup>N</sup><sup>n</sup> <sup>×</sup> <sup>N</sup><sup>n</sup> <sup>|</sup> g(m) g(n)} *is semilinear.*

*Proof.* Let <sup>Γ</sup> <sup>=</sup> {a<sup>1</sup>, a<sup>2</sup>,...,an} be an alphabet and define the monoid homomorphism <sup>f</sup> : <sup>Γ</sup><sup>∗</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> by <sup>f</sup>(a<sup>i</sup>) = <sup>w</sup><sup>i</sup> for all <sup>i</sup> <sup>∈</sup> [1, n]. One first shows that

> <sup>S</sup><sup>2</sup> <sup>=</sup> {(u, v) <sup>|</sup> u, v <sup>∈</sup> <sup>a</sup><sup>∗</sup> 1a∗ <sup>2</sup> ...a<sup>∗</sup> <sup>n</sup>, f(v) f(v)}

is rational. We now employ Nivat's theorem. It tells us that there are a regular language <sup>R</sup> over some alphabet <sup>Δ</sup> and two homomorphisms <sup>h</sup>1, h<sup>2</sup> : <sup>Δ</sup><sup>∗</sup> <sup>→</sup> <sup>Γ</sup><sup>∗</sup> so that we can write <sup>S</sup><sup>2</sup> <sup>=</sup> { - h<sup>1</sup>(w), h<sup>2</sup>(w) <sup>|</sup> w <sup>∈</sup> R}. Since R is regular, its Parikh-image Ψ<sup>Δ</sup>(R) = {Ψ<sup>Δ</sup>(w) <sup>|</sup> <sup>w</sup> <sup>∈</sup> <sup>R</sup>} is semilinear [19]. There are monoid homomorphisms <sup>p</sup>1, p<sup>2</sup> : <sup>N</sup><sup>Δ</sup> <sup>→</sup> <sup>N</sup><sup>n</sup> with <sup>Ψ</sup><sup>Γ</sup> (h<sup>i</sup>(w)) = <sup>p</sup><sup>i</sup>(Ψ<sup>Δ</sup>(w)) for all <sup>i</sup> ∈ {1, <sup>2</sup>} and w <sup>∈</sup> Δ<sup>∗</sup>. With these, the image <sup>H</sup> <sup>=</sup> { - p<sup>1</sup>(Ψ<sup>Δ</sup>(w)), p<sup>2</sup>(Ψ<sup>Δ</sup>(w)) <sup>|</sup> w <sup>∈</sup> R} of the set Ψ<sup>Δ</sup>(R) under the monoid homomorphism (p1, p<sup>2</sup>): <sup>N</sup><sup>Δ</sup> <sup>→</sup> <sup>N</sup><sup>n</sup> <sup>×</sup> <sup>N</sup><sup>n</sup> is semilinear. It turns out that this set equals the set from the lemma.

**Lemma 3.3.** *Let* <sup>w</sup>1, w2,...,w<sup>n</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup>*,* <sup>L</sup> <sup>⊆</sup> <sup>w</sup><sup>∗</sup> 1w∗ <sup>2</sup> ··· <sup>w</sup><sup>∗</sup> <sup>n</sup> *be context-free, and* g : <sup>N</sup><sup>n</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> *be defined by* <sup>g</sup>(m) = <sup>w</sup><sup>m</sup><sup>1</sup> <sup>1</sup> <sup>w</sup><sup>m</sup><sup>2</sup> <sup>2</sup> ··· <sup>w</sup><sup>m</sup><sup>n</sup> <sup>n</sup> *for every tuple* <sup>m</sup> <sup>=</sup> (m1, m2,...,m<sup>n</sup>) <sup>∈</sup> <sup>N</sup><sup>n</sup>*. Then there exists a semilinear set* U <sup>⊆</sup> <sup>N</sup><sup>n</sup> *such that* g *maps* U *bijectively onto* L*.*

*Proof.* The set U contains, for each u <sup>∈</sup> L, the lexicographically minimal tuple m <sup>∈</sup> <sup>N</sup><sup>n</sup> with g(m) = u. Then, Lemmas 3.1 and 3.2 and the closure of the class of semilinear sets under first-order definitions imply the required properties.

Now we can prove the main result of this section.

**Theorem 3.4.** *Let* <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> *be context-free and bounded. Then the* FO+MOD*theory of* (L, , -·,(<sup>K</sup> <sup>∩</sup> <sup>L</sup>)<sup>K</sup> *regular*,(w)<sup>w</sup>∈<sup>L</sup>) *is decidable.*

*Proof.* It suffices to prove the decidability for the structure <sup>S</sup> = (L, ,(K <sup>∩</sup> <sup>L</sup>)<sup>K</sup> regular) since the theory of the structure from the theorem can be reduced to that of <sup>S</sup> (x -· y gets replaced by its definition and xθw by <sup>∃</sup>y : y ∈ {w} ∧ xθy where θ is any binary relation symbol).

Since <sup>L</sup> is bounded, there are words <sup>w</sup><sup>1</sup>, w<sup>2</sup>,...,w<sup>n</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> such that <sup>L</sup> is included in w<sup>∗</sup> <sup>1</sup> <sup>w</sup><sup>∗</sup> <sup>2</sup> ··· <sup>w</sup><sup>∗</sup> <sup>n</sup>. For an n-tuple m = (m<sup>1</sup>, m<sup>2</sup>,...,m<sup>n</sup>) <sup>∈</sup> <sup>N</sup><sup>n</sup> we define g(m) = w<sup>m</sup><sup>1</sup> <sup>1</sup> <sup>w</sup><sup>m</sup><sup>2</sup> <sup>2</sup> ··· <sup>w</sup><sup>m</sup><sup>n</sup> <sup>n</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup>.


From these semilinear sets, we obtain first-order formulas λ(x), σ(x, y), and κK(x) in the language of (N, +) such that, for any m, n <sup>∈</sup> <sup>N</sup>n, we have

1. (N, +) <sup>|</sup><sup>=</sup> λ(m) ⇐⇒ <sup>m</sup> <sup>∈</sup> <sup>U</sup>, 2. (N, +) <sup>|</sup><sup>=</sup> σ(m, <sup>n</sup>) ⇐⇒ <sup>g</sup>(m) <sup>g</sup>(n), and 3. (N, +) <sup>|</sup><sup>=</sup> κK(m) ⇐⇒ <sup>g</sup>(m) <sup>∈</sup> <sup>K</sup>.

One then defines, from an FO+MOD-formula ϕ(x<sup>1</sup>,...,xk) in the language of <sup>S</sup>, an FO+MOD-formula ϕ (x<sup>1</sup>,..., xk) in the language of (N, +) such that

$$\varphi(\mathbb{N}, +) = \varphi'(\overline{m\_1}, \dots, \overline{m\_k}) \iff \mathcal{S} = \varphi(g(\overline{m\_1}), \dots, g(\overline{m\_k})).$$

(This construction can be found in the full version [18] and increases the formula size at least exponentially.)

Consequently, any sentence ϕ from FO+MOD in the language of <sup>S</sup> is translated into an equivalent sentence ϕ in the language of (N, +). By [1,5,21], validity of the sentence ϕ in (N, +) is decidable.

### **4 The C+MOD<sup>2</sup>-Theory with Regular Predicates**

It is the aim of this section to show that the C+MOD<sup>2</sup>-theory of the structure (L, , -·,(<sup>K</sup> <sup>∩</sup> <sup>L</sup>)<sup>K</sup> regular,(w)<sup>w</sup>∈<sup>L</sup>) is decidable for any regular language <sup>L</sup>. To this aim, we first show that the C+MOD<sup>2</sup>-theory of

$$\mathcal{S} = (\Sigma^\*, \sqsubseteq, \boxplus, (L)\_{L \text{ regular}}),$$

is decidable. This decidability proof extends the proof from [12] for the decidability of the FO<sup>2</sup>-theory of (Σ<sup>∗</sup>, ,(L)<sup>L</sup> regular). It provides a quantifier-elimination procedure (see Sect. 4.3) that relies on the following two properties:


#### **4.1 Unambiguous Rational Relations**

Recall that, by Nivat's theorem, a relation R <sup>⊆</sup> Σ<sup>∗</sup> <sup>×</sup>Σ<sup>∗</sup> is rational if there exist an alphabet Γ, a homomorphism h: Γ<sup>∗</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> <sup>×</sup> <sup>Σ</sup><sup>∗</sup>, and a regular language S <sup>⊆</sup> Γ<sup>∗</sup> such that <sup>h</sup> maps <sup>S</sup> surjectively onto <sup>R</sup>. We call <sup>R</sup> an *unambiguous rational relation* if, in addition, h maps S *injectively* (and therefore bijectively) onto R. Note that these are precisely the relations accepted by unambiguous 2-tape-automata.

While the class of rational relations is closed under unions, this is not the case for unambiguous rational relations (e.g., R <sup>=</sup> {(a<sup>m</sup>ba<sup>n</sup>, a<sup>m</sup>) <sup>|</sup> m, n <sup>∈</sup> <sup>N</sup>} ∪ {(a<sup>m</sup>ba<sup>n</sup>, a<sup>n</sup>) <sup>|</sup> m, n <sup>∈</sup> <sup>N</sup>} is the union of unambiguous rational relations but not unambiguous). But it is closed under *disjoint* unions.

**Lemma 4.1.** *For any alphabet* Σ*, the cover relation* -· *and the relation* - \-· *are unambiguous rational.*

*Proof.* For <sup>i</sup> ∈ {1, <sup>2</sup>}, let <sup>Σ</sup><sup>i</sup> <sup>=</sup> <sup>Σ</sup> × {i} and <sup>Γ</sup> <sup>=</sup> <sup>Σ</sup><sup>1</sup> <sup>∪</sup> <sup>Σ</sup><sup>2</sup>. Furthermore, let the homomorphism proj<sup>i</sup> : <sup>Γ</sup><sup>∗</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> be defined by proji(a, i) = <sup>a</sup> and proji(a, <sup>3</sup>−i) = ε for all a <sup>∈</sup> Σ. Finally, let the homomorphism proj: Γ<sup>∗</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> <sup>×</sup> <sup>Σ</sup><sup>∗</sup> be defined by proj(w) = (proj1(w), proj2(w)).

– The regular language

$$\mathrm{Sub} = \left( \bigcup\_{a \in \Sigma} \left( \left( \Sigma\_2 \mid \{ (a, 2) \} \right)^\* (a, 2) \left( a, 1 \right) \right) \right)^\* \Sigma\_2 ^\* .$$

is mapped bijectively onto the subword relation.


**Lemma 4.2.** *For any alphabet* Σ*, the incomparability relation*

$$\left| \right| = \left\{ (u, v) \in \Sigma^\* \times \Sigma^\* \mid neither\ u \sqsubseteq v\text{ } nor\ v \sqsubseteq u \right\}.$$

*is unambiguous rational.*

*Proof.* We will show that the following three relations are unambiguous rational:

1. <sup>R</sup><sup>1</sup> <sup>=</sup> {(u, v) | |u<sup>|</sup> <sup>&</sup>lt; <sup>|</sup>v<sup>|</sup> and not <sup>u</sup> <sup>v</sup>}, 2. <sup>R</sup><sup>2</sup> <sup>=</sup> {(u, v) | |u<sup>|</sup> <sup>=</sup> <sup>|</sup>v<sup>|</sup> and <sup>u</sup> <sup>=</sup> <sup>v</sup>}, and 3. <sup>R</sup><sup>3</sup> <sup>=</sup> {(u, v) | |u<sup>|</sup> <sup>&</sup>gt; <sup>|</sup>v<sup>|</sup> and not <sup>v</sup> <sup>u</sup>}.

The result follows since is the disjoint union of these relations. Let <sup>Σ</sup><sup>i</sup>, Γ, proji, and proj be defined as in the previous proof. First, the regular language

$$\text{Inc}\_2 = (\Sigma\_2 \Sigma\_1)^\* \cdot \{ (a, 2)(b, 1) \mid a, b \in \Sigma, a \neq b \} \cdot (\Sigma\_2 \Sigma\_1)^\*.$$

is mapped by proj bijectively onto <sup>R</sup><sup>2</sup>.

From [12, Lemma 5.2], we learn that (u, v) <sup>∈</sup> <sup>R</sup><sup>1</sup> <sup>∪</sup> <sup>R</sup><sup>2</sup> if, and only if,

– <sup>u</sup> <sup>=</sup> <sup>a</sup><sup>1</sup>a<sup>2</sup> ...au for some <sup>≥</sup> 1, <sup>a</sup><sup>1</sup>,...,a <sup>∈</sup> <sup>Σ</sup>, <sup>u</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup>, and – <sup>v</sup> <sup>∈</sup> (<sup>Σ</sup> \ {a<sup>1</sup>})<sup>∗</sup>a<sup>1</sup> (<sup>Σ</sup> \ {a<sup>2</sup>})<sup>∗</sup>a<sup>2</sup> ···(<sup>Σ</sup> \ {a−<sup>1</sup>})<sup>∗</sup>a−<sup>1</sup> (<sup>Σ</sup> \ {a})<sup>+</sup>v for some word v <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> with <sup>|</sup>u <sup>|</sup> <sup>=</sup> <sup>|</sup>v |.

Consequently, proj maps the following language bijectively onto <sup>R</sup><sup>1</sup> <sup>∪</sup> <sup>R</sup><sup>2</sup>:

$$\mathrm{Inc}\_{1,2} = \left(\bigcup\_{a \in \Sigma} \left( \left( \Sigma\_2 \mid \{ (a,2) \} \right)^\* (a,2)(a,1) \right) \right)^\* \cdot \bigcup\_{a \in \Sigma} \left( \left( \Sigma\_2 \mid \{ (a,2) \} \right)^+ (a,1) \right) \cdot \left( \Sigma\_2 \Sigma\_1 \right)^\*$$

and since Inc<sup>2</sup> <sup>⊆</sup> Inc<sup>1</sup>,<sup>2</sup>, proj maps Inc<sup>1</sup> = Inc<sup>1</sup>,<sup>2</sup> \ Inc<sup>2</sup> bijectively onto <sup>R</sup><sup>1</sup>. The claim regarding <sup>R</sup><sup>3</sup> follows analogously.

#### **4.2 Closure Properties of the Class of Regular Languages**

Let R <sup>⊆</sup> Σ<sup>∗</sup> <sup>×</sup> <sup>Σ</sup><sup>∗</sup> be an unambiguous rational relation and <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> a regular language. We want to show that the languages of all words u <sup>∈</sup> Σ<sup>∗</sup>

$$\text{with } |\{v \in L \mid (u, v) \in R\}| \ge k \tag{1}$$

$$\left| \left( \text{with } | \{ v \in L \mid (u, v) \in R \} | \in p + q \mathbb{N}, \text{ respectively} \right) \right.\tag{2}$$

are effectively regular for all k <sup>∈</sup> <sup>N</sup> and all 0 <sup>≤</sup> p<q, respectively (this does not hold for arbitrary rational relations). It is straightforward to work out direct automata constructions for this. However, the full details of this are somewhat cumbersome. Instead, we provide a proof via weighted automata, which enables us to split the two constructions into several simple steps.

Let S be a semiring. A function r : Σ<sup>∗</sup> <sup>→</sup> <sup>S</sup> is *realizable over* <sup>S</sup> if there are n <sup>∈</sup> <sup>N</sup>, λ <sup>∈</sup> S1×<sup>n</sup>, a homomorphism <sup>μ</sup>: <sup>Σ</sup><sup>∗</sup> <sup>→</sup> <sup>S</sup><sup>n</sup>×<sup>n</sup>, and <sup>ν</sup> <sup>∈</sup> <sup>S</sup><sup>n</sup>×<sup>1</sup> with r(w) = λ · μ(w) · ν for all w <sup>∈</sup> Σ<sup>∗</sup>. The triple (λ, μ, ν) is a *presentation of dimension* n or a *weighted automaton for* r.

In the following, we consider the semiring <sup>N</sup>∞, i.e., the set <sup>N</sup> ∪ {∞} together with the commutative operations + and · (with x+<sup>∞</sup> <sup>=</sup> <sup>∞</sup> for all x <sup>∈</sup> <sup>N</sup>∪ {∞}, x · ∞ <sup>=</sup> <sup>∞</sup> for all x <sup>∈</sup> (<sup>N</sup> ∪ {∞}) \ {0}, and 0 · ∞ = 0). Sometimes, we will argue about sums of infinitely many elements from N∞, which are defined as expected.

**Proposition 4.3.** *Let* <sup>Γ</sup> *and* <sup>Σ</sup> *be alphabets,* <sup>f</sup> : <sup>Γ</sup><sup>∗</sup> <sup>→</sup> Σ<sup>∗</sup> *a homomorphism, and* χ: Γ<sup>∗</sup> <sup>→</sup> <sup>N</sup><sup>∞</sup> *a realizable function over* <sup>N</sup>∞*. Then the following function* <sup>r</sup> *is effectively realizable over* N∞*:*

$$r = \chi \circ f^{-1} \colon \Sigma^\* \to \mathbb{N}^{\infty} \colon u \mapsto \sum\_{\substack{w \in \Gamma^\* \\ f(w) = u}} \chi(w)$$

*Proof.* The homomorphism <sup>f</sup> can be written as <sup>f</sup> <sup>=</sup> <sup>f</sup><sup>2</sup> ◦ <sup>f</sup><sup>1</sup> where <sup>f</sup><sup>1</sup> : <sup>Γ</sup><sup>∗</sup> <sup>→</sup> <sup>Γ</sup><sup>∗</sup> is non-expanding (i.e., <sup>f</sup><sup>1</sup>(a) <sup>∈</sup> <sup>Γ</sup> ∪ {ε} for all <sup>a</sup> <sup>∈</sup> <sup>Γ</sup>) and <sup>f</sup><sup>2</sup> : <sup>Γ</sup><sup>∗</sup> <sup>→</sup> <sup>Σ</sup><sup>∗</sup> is non-erasing (i.e., f<sup>2</sup>(a) <sup>∈</sup> <sup>Σ</sup><sup>+</sup> for all <sup>a</sup> <sup>∈</sup> <sup>Γ</sup>). Then <sup>r</sup> = (<sup>χ</sup> ◦ <sup>f</sup> <sup>−</sup><sup>1</sup> <sup>1</sup> ) ◦ <sup>f</sup> <sup>−</sup><sup>1</sup> <sup>2</sup> . Then χ <sup>=</sup> χ ◦ f <sup>−</sup><sup>1</sup> <sup>1</sup> is effectively realizable by [3, Lemma 2.2(b)].

Let (λ, μ, ν) be a presentation of dimension n for χ . For σ <sup>∈</sup> Σ ∪ {ε}, set <sup>Γ</sup><sup>σ</sup> <sup>=</sup> {<sup>b</sup> <sup>∈</sup> <sup>Γ</sup> <sup>|</sup> <sup>f</sup><sup>2</sup>(b) = <sup>σ</sup>}. Furthermore, define the matrix <sup>M</sup> <sup>∈</sup> (N∞)<sup>n</sup>×<sup>n</sup> by

$$M\_{ij} = \begin{cases} \infty & \text{if there is } w \in \varGamma\_{\varepsilon}^{\*} \text{ with } n < |w| \le 2n \text{ and } \mu(w)\_{ij} > 0\\ \sum\_{w \in \varGamma\_{\varepsilon}^{\leq n}} \mu(w)\_{ij} & \text{otherwise.} \end{cases}$$

Then <sup>M</sup>ij <sup>=</sup> w∈Γ <sup>∗</sup> <sup>ε</sup> <sup>μ</sup>(w)ij for all i, j <sup>∈</sup> [1, n]. Setting <sup>λ</sup> <sup>=</sup> <sup>λ</sup> · <sup>M</sup> and

$$\mu'(a) = \sum\_{b \in \Gamma\_a} \left( \mu(b) \cdot M \right) \text{ for all } a \in \Sigma$$

defines the presentation (λ , μ , ν) for the function r <sup>=</sup> χ ◦ f <sup>−</sup><sup>1</sup> <sup>2</sup> .

**Lemma 4.4.** *Let* R <sup>⊆</sup> Σ∗×Σ<sup>∗</sup> *be an unambiguous rational relation and* L <sup>⊆</sup> Σ<sup>∗</sup> *be regular. Then the following function* r *is effectively realizable over* <sup>N</sup>∞*:*

$$r \colon \Sigma^\* \to \mathbb{N}^\infty \colon u \mapsto |\{v \in L \mid (u, v) \in R\}|$$

*Proof.* Since R is unambiguous rational, so is R <sup>∩</sup> (Σ<sup>∗</sup> <sup>×</sup> <sup>L</sup>), i.e., there are an alphabet <sup>Γ</sup>, homomorphisms f,g : <sup>Γ</sup><sup>∗</sup> <sup>→</sup> <sup>Σ</sup>∗, and a regular language <sup>S</sup><sup>L</sup> <sup>⊆</sup> <sup>Γ</sup><sup>∗</sup> such that

$$(f, g) \colon \varGamma^\* \to \Sigma^\* \times \Sigma^\* \colon w \mapsto \left( f(w), g(w) \right)^\*$$

maps <sup>S</sup><sup>L</sup> bijectively onto <sup>R</sup> <sup>∩</sup> (Σ<sup>∗</sup> <sup>×</sup> <sup>L</sup>). Since <sup>S</sup><sup>L</sup> is regular, its characteristic function χ is effectively realizable by [20, Prop. 3.12]. One then shows that r is the function χ ◦ f <sup>−</sup><sup>1</sup> as in Proposition 4.3.

We now come to the main result of this section.

**Proposition 4.5.** *Let* R <sup>⊆</sup> Σ<sup>∗</sup> <sup>×</sup> <sup>Σ</sup><sup>∗</sup> *be an unambiguous rational relation and* L <sup>⊆</sup> Σ<sup>∗</sup> *be regular. Then, for* <sup>k</sup> <sup>∈</sup> <sup>N</sup> *and for* p, q <sup>∈</sup> <sup>N</sup> *with* p<q*, the set* <sup>H</sup> *of words* w *satisfying* (1) *and* (2)*, respectively, is effectively regular.*

Let R denote the rational relation mentioned before Lemma 4.1. Then a word a<sup>m</sup>ba<sup>n</sup> has <sup>≥</sup>2 "R-partners" iff it has an even number of "R-partners" iff m <sup>=</sup> n. Hence, the above proposition does not hold for arbitrary rational relations.

*Proof.* Let r be the function from Lemma 4.4. Setting x <sup>≡</sup> y iff x <sup>=</sup> y or k <sup>≤</sup> x, y < <sup>∞</sup> defines a congruence <sup>≡</sup> on <sup>N</sup>∞. Then <sup>S</sup><sup>∞</sup> <sup>k</sup> <sup>=</sup> <sup>N</sup><sup>∞</sup>/<sup>≡</sup> is a finite semiring and the function s: Σ<sup>∗</sup> <sup>→</sup> <sup>S</sup><sup>∞</sup> <sup>k</sup> : <sup>u</sup> → [r(u)] is effectively realizable. Since the semiring S<sup>∞</sup> <sup>k</sup> is finite, the "level sets" <sup>s</sup><sup>−</sup><sup>1</sup>([i]) = {<sup>u</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> <sup>|</sup> <sup>s</sup>(u) <sup>≡</sup> <sup>i</sup>} are effectively regular by [20, Prop. 4.5]. Since s<sup>−</sup><sup>1</sup>([k])∪s<sup>−</sup><sup>1</sup>([∞]) is the language of words u satisfying (1), the first result follows.

For the second language, we consider the congruence ≡ ⊆ <sup>N</sup><sup>∞</sup> <sup>×</sup> <sup>N</sup><sup>∞</sup> with x <sup>≡</sup> y iff x <sup>=</sup> y or q <sup>≤</sup> x, y < <sup>∞</sup> and x <sup>−</sup> y <sup>∈</sup> qN.

#### **4.3 Quantifier Elimination for C+MOD<sup>2</sup>**

Our decision algorithm employs a quantifier alternation procedure, i.e., we will transform an arbitrary formula into an equivalent one that is quantifier-free. As usual, the heart of this procedure handles formulas ψ = Qy ϕ where Q is a quantifier and ϕ is quantifier-free. Since the logic C+MOD<sup>2</sup> has only two variables, any such formula ψ has at most one free variable. In other words, it defines a language K. The following lemma shows that this language is effectively regular, such that ψ is equivalent to the quantifier-free formula x <sup>∈</sup> K.

**Lemma 4.6.** *Let* ϕ(x, y) *be a quantifier-free formula from* C+MOD<sup>2</sup> *in the language of the structure* <sup>S</sup> = (Σ<sup>∗</sup>, , -·,(L)<sup>L</sup> *regular*)*. Then the sets*

$$\{x \in \Sigma^\* \mid \mathcal{S} \vdash \exists^{\geq k} y \,\varphi\} \text{ and } \{x \in \Sigma^\* \mid \mathcal{S} \vdash \exists^{p \bmod q} y \,\varphi\}.$$

*are effectively regular for all* k <sup>∈</sup> <sup>N</sup> *and all* p, q <sup>∈</sup> <sup>N</sup> *with* p<q*.*

*Proof.* Since ϕ is quantifier-free, we can rewrite it into a Boolean combination of formulas of the form x <sup>∈</sup> K and y <sup>∈</sup> L for some regular languages K and L, x y and y x, and x -· y and y -· x.

There are six possible relations between the two variables x and y in the partial order: we can have x <sup>=</sup> y, x-· y or *vice versa*, x y∧¬x-· y or *vice versa*, or x y. Let θi(x, y) for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> 6 be formulas describing these relations.

Hence ϕ is equivalent to  1≤i≤6 - <sup>θ</sup><sup>i</sup> <sup>∧</sup> <sup>ϕ</sup>). In this formula, any occurrence of ϕ appears in conjunction with precisely one of the formulas θi. Depending on this formula <sup>θ</sup><sup>i</sup> (i.e., the relation between <sup>x</sup> and <sup>y</sup>), we can simplify <sup>ϕ</sup> to <sup>ϕ</sup><sup>i</sup> by replacing the atomic subformulas that compare x and y by true or false. As a result, the formula ϕ is equivalent to  1≤i≤6 - <sup>θ</sup><sup>i</sup> <sup>∧</sup> <sup>ϕ</sup><sup>i</sup>) where the formulas <sup>ϕ</sup><sup>i</sup> are Boolean combinations of formulas of the form x <sup>∈</sup> K and y <sup>∈</sup> L for some regular languages K and L.

Now let <sup>k</sup> <sup>∈</sup> <sup>N</sup>. Since the formulas <sup>θ</sup><sup>i</sup> are mutually exclusive, we get

$$\exists \exists^k y \,\varphi \equiv \exists^{\geq k} y \quad \bigvee\_{1 \leq i \leq 6} (\theta\_i \wedge \varphi\_i) \equiv \bigvee\_{(\*)} \bigwedge\_{1 \leq i \leq 6} \exists^{\geq k\_i} y \,(\theta\_i \wedge \varphi\_i)$$

where the disjunction (∗) extends over all (k1,...,k<sup>6</sup>) <sup>∈</sup> <sup>N</sup><sup>6</sup> with <sup>1</sup>≤i≤<sup>6</sup> <sup>k</sup><sup>i</sup> <sup>=</sup> <sup>k</sup>.

Hence it suffices to show that

$$\{x \in \Sigma^\* \mid \exists^{\geq k} y \,(\theta\_i \land \varphi)\}\tag{3}$$

is effectively regular for all 1 <sup>≤</sup> i <sup>≤</sup> 6, all k <sup>∈</sup> <sup>N</sup>, and all Boolean combinations ϕ of formulas of the form x <sup>∈</sup> K and y <sup>∈</sup> L where K and L are regular languages. We can find regular languages <sup>K</sup><sup>M</sup> and <sup>L</sup><sup>M</sup> and a finite set <sup>I</sup> such that ϕ is equivalent to  <sup>M</sup>∈<sup>I</sup> (<sup>x</sup> <sup>∈</sup> <sup>K</sup><sup>M</sup> <sup>∧</sup> <sup>y</sup> <sup>∈</sup> <sup>L</sup><sup>M</sup>) and such that this disjunction is exclusive. Hence the set from (3) equals the union of the sets

$$\{x \in \Sigma^\* \mid \exists^{\geq k} y \left(\theta\_i \land x \in K\_M \land y \in L\_M\right)\} = K\_M \cap \underbrace{\{x \in \Sigma^\* \mid \exists^{\geq k} y \in L\_M \colon \theta\_i\}}\_{H\_M}$$

for <sup>M</sup> <sup>∈</sup> <sup>I</sup>. The set <sup>H</sup><sup>M</sup> is effectively regular by Proposition 4.5 and Lemmas 4.1 and 4.2. Since the language in the claim of the lemma is a Boolean combination of such sets, the first claim is demonstrated; the second follows similarly.

The only atomic formulas with a single variable x are x <sup>∈</sup> L with L regular, <sup>x</sup> <sup>=</sup> <sup>x</sup>, <sup>x</sup> <sup>x</sup> (which are equivalent to <sup>x</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup>), and x -· x (which is equivalent to x ∈ ∅). Hence, any quantifier-free formula with a single free variable x is a Boolean combination of statements of the form x <sup>∈</sup> L. Lemma 4.6 thus implies:

**Theorem 4.7.** *Let* <sup>S</sup> = (Σ<sup>∗</sup>, , -·,(L)<sup>L</sup> *regular*)*. Let* <sup>ϕ</sup>(x) *be a formula from* C+MOD<sup>2</sup>*. Then the set* {x <sup>∈</sup> Σ<sup>∗</sup> |S|<sup>=</sup> ϕ} *is effectively regular.*

**Corollary 4.8.** *Let* <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> *be a regular language. Then the* C+MOD<sup>2</sup>*-theory of the structure* <sup>S</sup><sup>L</sup> = (L, , -·,(<sup>K</sup> <sup>∩</sup> <sup>L</sup>)<sup>K</sup> *regular*,(w)<sup>w</sup>∈<sup>L</sup>) *is decidable.*

*Proof.* Let <sup>ϕ</sup> <sup>∈</sup> C+MOD<sup>2</sup> be a sentence. We build <sup>ϕ</sup><sup>L</sup> by (1) restricting all quantifications to L, (2) replace xθw by <sup>∃</sup>y : y ∈ {w} ∧ xθy, and dually for yθw for all w <sup>∈</sup> L and all binary relations θ.

With <sup>S</sup> the structure from Theorem 4.7, we obtain S |<sup>=</sup> <sup>ϕ</sup><sup>L</sup> ⇐⇒ S<sup>L</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>. By Theorem 4.7, the language {<sup>x</sup> |S|<sup>=</sup> <sup>ϕ</sup>L} is regular (since <sup>ϕ</sup><sup>L</sup> is a sentence, it is <sup>∅</sup> or <sup>Σ</sup>∗). Hence <sup>ϕ</sup><sup>L</sup> holds iff this set is nonempty, which is decidable.

# **5 The** *Σ***1-Theory**

In this section, we study for which regular languages L the Σ<sup>1</sup>-theory of the structure (L, ) is decidable. If L is bounded, then decidability follows from Theorem 3.4. In the case of (Σ<sup>∗</sup>, ), decidability is known as well [17]. Here, we prove decidability for every regular language L. Note that in terms of quantifier block alternation, this is optimal: The Σ<sup>2</sup>-theory is undecidable already in the simple case of ({a, b}<sup>∗</sup>, ) [6].

**Theorem 5.1.** *For every regular* <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup>*, the* Σ1*-theory of* (L, ) *is decidable.*

Observe that very generally, the <sup>Σ</sup><sup>1</sup>-theory of a partially ordered set (P, <sup>≤</sup>) is decidable if every finite partial order embeds into (P, <sup>≤</sup>): In that case, a formula with n variables is satisfied in (P, <sup>≤</sup>) if and only if it is satisfied for some finite partial order with at most n elements. This is used to obtain decidability for the case L <sup>=</sup> Σ<sup>∗</sup> with <sup>|</sup>Σ| ≥ 2 in [17].

As mentioned above, if L is bounded, decidability follows from Theorem 3.4. If L is unbounded, it is well-known that there is a subset x{p, q}<sup>∗</sup><sup>y</sup> <sup>⊆</sup> <sup>L</sup> such that <sup>|</sup>p<sup>|</sup> <sup>=</sup> <sup>|</sup>q<sup>|</sup> and p <sup>=</sup> q (see Lemma 5.2). Since in that case, the monoids ({a, b}<sup>∗</sup>, ·) and ({p, q}<sup>∗</sup>, ·) are isomorphic, it is tempting to assume that ({a, b}<sup>∗</sup>, ) embeds into ({p, q}<sup>∗</sup>, ) and thus into (x{p, q}<sup>∗</sup>y, ). However, that is not the case. If L <sup>=</sup> {ab, ba}<sup>∗</sup>, then the downward closure of any infinite subset of <sup>L</sup> includes all of L. Since, on the other hand, ({a, b}<sup>∗</sup>, ) has infinite downward closed strict subsets such as a<sup>∗</sup>, it cannot embed into (L, ). Nevertheless, the rest of this section demonstrates that every finite partial order embeds into (L, ) whenever L is an unbounded regular language. By the previous paragraph, this implies Theorem 5.1.

We recall a well-known property of unbounded regular languages.

**Lemma 5.2.** *If* L <sup>⊆</sup> Σ<sup>∗</sup> *is not bounded, then there are* x, y, p, q <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> *such that* <sup>|</sup>p<sup>|</sup> <sup>=</sup> <sup>|</sup>q|*,* <sup>p</sup> <sup>=</sup> <sup>q</sup>*, and* <sup>x</sup>{p, q}<sup>∗</sup>y <sup>⊆</sup> L*.*

*Proof.* Let A be any non-degenerate deterministic finite automaton accepting L. Then at least one strongly connected component of A is not a cycle since otherwise, L would be bounded. Hence, there is a state s and prefix-incomparable words u, v, each of which is read on a cycle starting in s. Since u and v are prefixincomparable, the words p <sup>=</sup> uv and q <sup>=</sup> vu are distinct, but equally long. Since A is non-degenerate, there are words x, y <sup>∈</sup> Σ<sup>∗</sup> with <sup>x</sup>{p, q}<sup>∗</sup><sup>y</sup> <sup>⊆</sup> <sup>L</sup>.

To have some control over how words can embed, we prove a stronger version of Lemma 5.2. Two words p, q <sup>∈</sup> Σ<sup>∗</sup> are *conjugate* if there are x, y <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> with p <sup>=</sup> xy and q <sup>=</sup> yx. A word p <sup>∈</sup> Σ<sup>∗</sup> is *primitive* if there is no <sup>q</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> with p <sup>∈</sup> qq<sup>+</sup>.

**Proposition 5.3.** *For every unbounded regular language* <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup>∗*, there are* x, u, v, y <sup>∈</sup> Σ<sup>∗</sup> *such that* <sup>|</sup>u<sup>|</sup> <sup>=</sup> <sup>|</sup>v|*, the word* uv *is primitive, and* <sup>x</sup>{u, v}∗<sup>y</sup> <sup>⊆</sup> <sup>L</sup>*.*

*Proof.* Since L is unbounded and regular, Lemma 5.2 yields words x, y, p, q <sup>∈</sup> Σ<sup>∗</sup> with <sup>|</sup>p<sup>|</sup> <sup>=</sup> <sup>|</sup>q|, p <sup>=</sup> q, and x{p, q}<sup>∗</sup>y <sup>⊆</sup> L. Then the words r <sup>=</sup> pq and s <sup>=</sup> pp are not conjugate, because every conjugate of a square is a square. Moreover, <sup>|</sup>r<sup>|</sup> <sup>=</sup> <sup>|</sup>s|, and x{r, s}<sup>∗</sup><sup>y</sup> <sup>⊆</sup> <sup>x</sup>{p, q}<sup>∗</sup><sup>y</sup> <sup>⊆</sup> <sup>L</sup>. Let <sup>n</sup> <sup>=</sup> <sup>|</sup>r|, <sup>u</sup> <sup>=</sup> rs<sup>n</sup>−<sup>1</sup>, and <sup>v</sup> <sup>=</sup> <sup>s</sup><sup>n</sup>. Towards a contradiction, suppose uv <sup>=</sup> rs2n−<sup>1</sup> is not primitive. Then there is a word w <sup>∈</sup> Σ<sup>∗</sup> with rs2n−<sup>1</sup> <sup>∈</sup> ww<sup>+</sup>. Depending on whether <sup>|</sup>w| ≥ <sup>n</sup> or <sup>|</sup>w<sup>|</sup> < n, we have n ≤ |w<sup>t</sup> | ≤ n<sup>2</sup> either for t = 1 or for t <sup>=</sup> n. It follows that r is a prefix of w<sup>t</sup> and that <sup>w</sup><sup>t</sup> is a suffix of <sup>s</sup><sup>n</sup>, implying that <sup>r</sup> is a factor of <sup>s</sup><sup>n</sup>. Since <sup>r</sup> and s are not conjugate, this is impossible.

We are now ready to describe how to embed a finite partial order into (L, ). Observe that every finite partial order with <sup>m</sup> elements embeds into ({0, <sup>1</sup>}<sup>m</sup>, <sup>≤</sup>) where ≤ is the componentwise order. Hence, it suffices to embed this partial order into ({u, v}<sup>∗</sup>, ). We do this as follows. Let n <sup>=</sup> <sup>|</sup>uv<sup>|</sup> <sup>+</sup> m + 3 and define, for a tuple <sup>t</sup> = (t1,...,t<sup>m</sup>) ∈ {0, <sup>1</sup>}<sup>m</sup>,

$$
\varphi\_m(t\_1, \ldots, t\_m) = v^{t\_1} (uv)^n \cdots v^{t\_m} (uv)^n \,.
$$

Then, clearly, <sup>s</sup> <sup>≤</sup> <sup>t</sup> implies <sup>ϕ</sup><sup>m</sup>(s) ϕ<sup>m</sup>(t). The converse requires a careful analysis of how prefixes of ϕ<sup>m</sup>(s) can embed into prefixes of <sup>ϕ</sup><sup>m</sup>(t). For x, y <sup>∈</sup> <sup>Σ</sup><sup>∗</sup>, we write x <sup>→</sup> y if x, but no word xa with a <sup>∈</sup> Σ is a subword of y. In other words, x <sup>→</sup> y if x is a *prefix-maximal subword of* y. This gives us a criterion for non-embeddability: If <sup>x</sup> has a strict prefix <sup>x</sup><sup>0</sup> with <sup>x</sup><sup>0</sup> <sup>→</sup> <sup>y</sup>, then certainly <sup>x</sup> <sup>y</sup>. In this case, the word <sup>x</sup><sup>1</sup> with <sup>x</sup> <sup>=</sup> <sup>x</sup><sup>0</sup>x<sup>1</sup> is called *residue*. We show the following:

**Lemma 5.4.** *Let* u, v <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> *be words such that* <sup>|</sup>u<sup>|</sup> <sup>=</sup> <sup>|</sup>v<sup>|</sup> *and* uv *is primitive. Then, for all* , n <sup>∈</sup> <sup>N</sup> *with* n > <sup>|</sup>uv<sup>|</sup> <sup>+</sup> + 2*, we have*

$$\begin{array}{l}(i) \ (uv)^{n} \rightarrow v(uv)^{n},\\(ii) \ (uv)^{\ell}v(uv)^{n-\ell-1} \rightarrow (uv)^{n}, \text{ and}\\(iii) \ (uv)^{1+\ell}v(uv)^{n-\ell-2} \rightarrow v(uv)^{n}.\end{array}$$

For this lemma, it is crucial to observe that for a primitive word w and n > <sup>|</sup>w|+1, any embedding of w<sup>n</sup>−<sup>1</sup> into <sup>w</sup><sup>n</sup> must either hit the left-most or the right-most position in w<sup>n</sup>. To conclude that <sup>s</sup> <sup>≤</sup> <sup>t</sup> implies <sup>ϕ</sup><sup>m</sup>(s) <sup>ϕ</sup><sup>m</sup>(t), we argue about prefixes of the form <sup>p</sup><sup>i</sup> <sup>=</sup> <sup>v</sup><sup>s</sup><sup>1</sup> (uv)<sup>n</sup> ··· <sup>v</sup><sup>s</sup><sup>i</sup> (uv)<sup>n</sup> and <sup>q</sup><sup>i</sup> <sup>=</sup> <sup>v</sup><sup>t</sup><sup>1</sup> (uv)<sup>n</sup> ··· <sup>v</sup><sup>t</sup><sup>i</sup> (uv)<sup>n</sup> for <sup>i</sup> <sup>∈</sup> [1, m]. If <sup>s</sup> <sup>≤</sup> <sup>t</sup>, let <sup>i</sup> <sup>∈</sup> [1, m] be the index with <sup>s</sup><sup>i</sup> = 1, <sup>t</sup><sup>i</sup> = 0 and <sup>s</sup><sup>j</sup> <sup>≤</sup> <sup>t</sup><sup>j</sup> for all <sup>j</sup> <sup>∈</sup> [1, i <sup>−</sup> 1]. Then clearly <sup>p</sup><sup>i</sup>−<sup>1</sup> <sup>q</sup><sup>i</sup>−<sup>1</sup>. In fact, Lemma 5.4 (i) implies that even <sup>p</sup><sup>i</sup>−<sup>1</sup> <sup>→</sup> <sup>q</sup><sup>i</sup>−<sup>1</sup>, since x <sup>→</sup> <sup>y</sup> and <sup>x</sup> <sup>→</sup> <sup>y</sup> imply xy <sup>→</sup> <sup>x</sup> y . Then, by Lemma 5.4 (ii), <sup>p</sup><sup>i</sup> <sup>=</sup> <sup>p</sup>i−<sup>1</sup>v(uv)n−<sup>1</sup>(uv) has a residue of uv in <sup>q</sup><sup>i</sup> <sup>=</sup> <sup>q</sup>i−<sup>1</sup>(uv)n. To conclude ϕm(s) <sup>ϕ</sup>m(t), it remains to be shown that this can never be rectified when considering prefixes <sup>p</sup><sup>j</sup> and <sup>q</sup><sup>j</sup> for <sup>j</sup> <sup>=</sup> <sup>i</sup> + 1,...,m. To this end, Lemma 5.4 (ii) and (iii) tell us that if <sup>p</sup><sup>j</sup> has a residue of (uv) in <sup>q</sup><sup>j</sup> , then the word <sup>p</sup>j+1 has a residue of (uv) or even (uv)+1 in <sup>q</sup>j+1.

# **6 The** *Σ***1-Theory with Constants**

In this section, we study for which languages L the structure (L, ,(w)<sup>w</sup>∈<sup>L</sup>) has a decidable Σ<sup>1</sup>-theory. From Theorem 3.4, we know that this is the case whenever L is bounded. However, there are very simple languages for which decidability is lost: If <sup>|</sup>Σ| ≥ 2, then the Σ<sup>1</sup>-theory of (Σ<sup>∗</sup>, ,(w)<sup>w</sup>∈Σ<sup>∗</sup> ) is undecidable [6]. Here, we present a sufficient condition for the Σ<sup>1</sup>-theory of (L, ,(w)<sup>w</sup>∈Σ<sup>∗</sup> ) to be decidable.

Let L <sup>⊆</sup> Σ<sup>∗</sup>. We say that a letter <sup>a</sup> <sup>∈</sup> <sup>Σ</sup> is *frequent* in <sup>L</sup> if there is a real constant δ > 0 so that <sup>|</sup>w|<sup>a</sup> <sup>≥</sup> <sup>δ</sup> · |w<sup>|</sup> for all but finitely many <sup>w</sup> <sup>∈</sup> <sup>L</sup>. Our sufficient condition requires that all letters be frequent in L. If L is regular, this is equivalent to saying that in every non-degenerate automaton for L, every cycle contains every letter. An example of such a language is {ab, ba}<sup>∗</sup>.

We shall prove that this condition implies decidability of the Σ<sup>1</sup>-theory of (L, ,(w)<sup>w</sup>∈Σ<sup>∗</sup> ). If <sup>L</sup> is bounded, decidability already follows from Theorem 3.4. In case L is unbounded, we employ our results from Sect. <sup>5</sup> to show another embeddability result. For w <sup>∈</sup> Σ<sup>∗</sup>, let <sup>w</sup><sup>↑</sup> <sup>=</sup> {<sup>u</sup> <sup>∈</sup> <sup>Σ</sup><sup>∗</sup> <sup>|</sup> <sup>w</sup> <sup>u</sup>} denote the upward closure of {w} in (Σ<sup>∗</sup>, ). We will show that if <sup>L</sup> is unbounded, then for each w <sup>∈</sup> Σ<sup>∗</sup>, the decomposition of <sup>L</sup> = (<sup>L</sup> \ <sup>w</sup>↑) <sup>∪</sup> (<sup>L</sup> <sup>∩</sup> <sup>w</sup>↑) yields two simple parts: The set L \ w<sup>↑</sup> is finite and the set L <sup>∩</sup> w<sup>↑</sup> embeds every finite partial order. This simplifies the conditions under which a Σ<sup>1</sup>-sentence is satisfied.

**Lemma 6.1.** *Let* L <sup>⊆</sup> Σ<sup>∗</sup> *be an unbounded regular language where every letter is frequent. For every* w <sup>∈</sup> Σ<sup>∗</sup>*, the set* <sup>L</sup> \ <sup>w</sup><sup>↑</sup> *is finite and* <sup>L</sup> <sup>∩</sup> <sup>w</sup><sup>↑</sup> *is unbounded.*

*Proof.* In a non-degenerate automaton A for L, every cycle must contain every letter. Therefore, if A has n states and v <sup>∈</sup> L has <sup>|</sup>v<sup>|</sup> > n·|w|, then a computation for v must contain some state more than <sup>|</sup>w<sup>|</sup> times, which implies w v and hence v /<sup>∈</sup> L\w↑. Therefore, L\w<sup>↑</sup> is finite. This implies that L∩w<sup>↑</sup> is unbounded: Otherwise L = (L <sup>∩</sup> w↑) <sup>∪</sup> (L \ w↑) would be bounded as well.

**Theorem 6.2.** *Let* <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup><sup>∗</sup> *be an unbounded regular language where every letter is frequent. Then the* Σ<sup>1</sup>*-theory of* (L, ,(w)<sup>w</sup>∈<sup>L</sup>) *is decidable.*

*Proof.* For decidability, we may assume that we are given a formula ϕ that is a disjunction of conjunctions of literals of the following forms (where x and y are arbitrary variables and w an arbitrary word from L):

$$\begin{array}{llll} \text{(i)} \ x \sqsubseteq w & \text{(iii)} \ w \sqsubseteq x & \text{(v)} \ x \sqsubseteq y\\ \text{(ii)} \ x \sqsubseteq w & \text{(iv)} \ w \nssubseteq x & \text{(vi)} \ x \sqsubseteq y \end{array}$$

*Step 1.* We first show that literals of types (i) and (iv) can be eliminated. To this end, we observe that for each w <sup>∈</sup> L, both of the sets {u <sup>∈</sup> L <sup>|</sup> u w}, and {u <sup>∈</sup> L <sup>|</sup> w u} are finite (in the latter case, this follows from Lemma 6.1). Thus, every conjunction that contains a literal x w or w x, constrains x to finitely many values. Therefore, we can replace this conjunction with a disjunction of conjunctions that result from replacing x by one of these values. (Here, we might obtain literals u v or u v, but those can be replaced by other equivalent formulas). We repeat this until there are no more literals of the form (i) and (iv). *Step 2.* We now eliminate literals of the form (ii). Note that the language {u <sup>∈</sup> L <sup>|</sup> u w} is upward closed in (L, ). Since L is regular, we can compute the finite set of minimal elements of this set. Thus, x w is equivalent to a finite disjunction of literals of the form w <sup>x</sup>. The resulting formula <sup>ψ</sup> is a disjunction of conjunction of literals of the form (iii), (v), (vi).

*Step 3.* To check satisfiability, we may assume that ψ is a conjunction of literals of the form (iii), (v), (vi). We can write <sup>ψ</sup> as <sup>γ</sup><sup>1</sup>∧γ<sup>2</sup>, where <sup>γ</sup><sup>1</sup> is a conjunction of literals of the form (iii) and <sup>γ</sup><sup>2</sup> is a conjunction of literals of the form (v) and (vi). We claim that <sup>ψ</sup> is satisfiable if and only if <sup>γ</sup><sup>2</sup> is satisfiable in some partial order. The "only if" direction is trivial, so suppose <sup>γ</sup><sup>2</sup> is satisfied by some finite partial order (P, <sup>≤</sup>) and let w <sup>∈</sup> Σ<sup>∗</sup> be a concatenation of all words occurring in <sup>γ</sup><sup>1</sup>. By Lemma 6.1, L <sup>∩</sup> w<sup>↑</sup> is unbounded, which implies that (P, <sup>≤</sup>) can be embedded into (L <sup>∩</sup> w↑, ) (see Sect. 5). This means, there exists a satisfying assignment where even <sup>w</sup> <sup>x</sup> for every variable <sup>x</sup>. In particular, it satisfies <sup>ψ</sup> <sup>=</sup> <sup>γ</sup><sup>1</sup> <sup>∧</sup> <sup>γ</sup><sup>2</sup>.

### **Open Questions**

We did not consider complexity issues. In particular, from [13], we know that the FO<sup>2</sup>-theory of the structure (Σ<sup>∗</sup>, ,(w)<sup>w</sup>∈Σ<sup>∗</sup> ) can be decided in elementary time. We are currently working out the details for the extension of this result to the C+MOD<sup>2</sup>-theory of the structure (L, ,(w)<sup>w</sup>∈<sup>L</sup>) for regular languages L. We reduced the FO+MOD-theory of the full structure (for L context-free and bounded) to the FO+MOD-theory of (N, +), which is known to be decidable in elementary time [5]. Our reduction increases the formula exponentially due to the need of handling statements of the form "there is an even number of pairs (x, y) <sup>∈</sup> <sup>N</sup><sup>2</sup> such that ..." It should be checked whether the proof from [5] can be extended to handle such statements in FO+MOD for (N, +) directly.

Finally, our results raise an interesting question: For which regular languages L does the structure (L, ,(w)<sup>w</sup>∈<sup>L</sup>) have a decidable <sup>Σ</sup><sup>1</sup>-theory? If every letter is frequent in L, we have decidability. For example, this applies to L <sup>=</sup> {ab, ba}<sup>∗</sup> or L <sup>=</sup> {ab, baa}<sup>∗</sup> <sup>∪</sup> bb{abb}<sup>∗</sup>. If L <sup>=</sup> Σ<sup>∗</sup> for <sup>|</sup>Σ| ≥ 2, we have undecidability [6].

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Strong Adequacy and Untyped Full-Abstraction for Probabilistic Coherence Spaces**

Thomas Leventis1,2(B) and Michele Pagani<sup>1</sup>

<sup>1</sup> IRIF UMR 8243, Universit´e Paris Diderot, Sorbonne Paris Cit´e, CNRS, Paris, France {leventis,pagani}@irif.fr <sup>2</sup> University of Bologna, Bologna, Italy

**Abstract.** We consider the probabilistic untyped lambda-calculus and prove a stronger form of the adequacy property for probabilistic coherence spaces (PCoh), showing how the denotation of a term statistically distributes over the denotations of its head-normal forms.

We use this result to state a precise correspondence between PCoh and a notion of probabilistic Nakajima trees, recently introduced by Leventis in order to prove a separation theorem. As a consequence, we get full abstraction for PCoh. This latter result has already been mentioned as a corollary of Clairambault and Paquet's full abstraction theorem for probabilistic concurrent games. Our approach allows to prove the property directly, without the need of a third model.

**Keywords:** Lambda-Calculus · Denotational semantics · Probabilistic functional programming

# **1 Introduction**

Full abstraction for the maximal consistent sensible <sup>λ</sup>-theory <sup>H</sup>- [1] is a crucial property for a model of the untyped λ-calculus, stating that two terms M,N have the same denotation in the model iff for every context C[ ] the head-reduction sequences of C[M] and C[N] either both terminate or both diverge. The first such result was obtained for Scott's model D<sup>∞</sup> by Hyland [10] and Wadsworth [15]. More recently, Manzonetto developed a general technique for achieving full abstraction for a large class of models, decomposing it into the *adequacy property* and a notion of *well-stratification* [13]. An adequacy property states that the semantics of a λ-term is different from the bottom element iff its headreduction terminates. Well-stratification is more technical, basically it means that the semantics of a λ-term can be stratified into different levels, expressing in the model the nesting of the head-normal forms defining the interaction between a λ-term and a context.

Our paper reconsiders these results in the setting of the probabilistic untyped λ-calculus Λ<sup>+</sup>. The language extends the untyped λ-calculus with a barycentric sum constructor allowing for terms like M +<sup>p</sup> N, with p ∈ [0, 1], reducing to M with probability p and to N with probability 1 − p. In recent years there has been a renewed interest in Λ<sup>+</sup> as a core language for (untyped) discrete probabilistic functional programming. In particular, Leventis proves in [12] a separation property for Λ<sup>+</sup> based on a probabilistic version of *Nakajima trees*, the latter describing a nesting of sub-probability distributions of infinitary η-long head-normal forms (see Sect. 5 and the examples in Fig. 2).

We consider the semantics of Λ<sup>+</sup> given by the probabilistic coherence space D defined by Danos and Ehrhard in [5] and proved to be adequate in [6]. We show that the denotation -<sup>M</sup> in <sup>D</sup> of a <sup>Λ</sup><sup>+</sup> term <sup>M</sup> enjoys a kind of stratification property (Theorem 1, called here *strong adequacy*) and we use this property to prove that -M is a faithful description of the probabilistic Nakajima tree of M (Corollary 1). As a consequence of this result and the previously mentioned separation theorem, we achieve full abstraction for D (Theorem 2), thus reconstructing in this setting Manzonetto's reasoning for classical λ-calculus.

Very recently, and independently from this work, Clairambault and Paquet also prove full abstraction for D [2]. Their proof uses a game semantics model representing in an abstract way the probabilistic Nakajima trees and a faithful functor from this game semantics to the weighted relational semantics of [11]. The latter provides a model having the same equational theory over Λ<sup>+</sup> as the probabilistic coherence space D, so full abstraction for D follows immediately. By the way, let us emphasise that all results in our paper can be transferred as they are to the weighted relational semantics of [11]. We decided however to consider the probabilistic coherence space model in order to highlight the correspondence between the definition of D (Eq. (11)) and the definition of the logical relation (Eq. (13)) which is the key ingredient in the proof of our notion of stratification.

Let us give some more intuitions on this latter notion, which has an interest in its own. The model D is defined as the limit of a chain of probabilistic coherence spaces (D)∈<sup>N</sup> approximating more and more the denotation of <sup>Λ</sup><sup>+</sup> terms. The adequacy property proven in [6] states that the probability of a term M to converge to a head-normal form is given by the mass of the semantics -M restricted to the subspace D<sup>2</sup> [6, Theorem 22]. The natural question is then to understand which kind of operational meaning carries the rest of the mass of -M, i.e. the points of order greater than 2. Our Theorem 1 answers this question, showing that the semantics -M distributes over the semantics of its head-normal forms according to the operational semantics of Λ<sup>+</sup>. By iterating this reasoning one gets a stratification of -M into a nesting of (η-expanded) head-normal forms which is the key ingredient linking -M and the probabilistic Nakajima trees (Corollary 1).

The fact that our proof of full abstraction is based on the notion of strong adequacy makes very plausible that the proof can be adapted to a more general class of models than only probabilistic coherence spaces and weighted semantics. In particular, we would like to stress that we did not use the property of analyticity of term denotations, which is instead at the core of the proof of full abstraction for probabilistic PCF-like languages [7,8].

*Notational convention.* We write <sup>N</sup> for the set of natural numbers and <sup>R</sup>≥<sup>0</sup> for the set of non-negative real numbers. Given any set X we write Mf(X) for the set of **finite multisets of** *<sup>X</sup>*: an element <sup>m</sup> ∈ Mf(X) is a function <sup>X</sup> <sup>→</sup> <sup>N</sup> such that the **support** of m Supp (m) = {x ∈ X | m(x) > 0} is finite. We write [x1,...,xn] for the multiset m such that m(x) = *number of indices i s.t.* x = xi, so [] is the empty multiset and the disjoint union. The **Kronecker delta** over a set X is defined for x, y ∈ X by: δx,y = 1 if x = y, and δx,y = 0 otherwise.

### **2 The Probabilistic Language** *Λ***<sup>+</sup>**

We recall the call-by-name untyped probabilistic λ-calculus, following [6]. The set <sup>Λ</sup><sup>+</sup> of terms over a set <sup>V</sup> of variables is defined inductively by:

$$M, N \in \Lambda^+ ::= x \mid \lambda x.M \mid MN \mid M +\_p N,\tag{1}$$

where x ranges over V and p ranges over [0, 1]. Note that we consider probabilities over the whole interval [0, 1] but our proofs still hold if we restrict them to rational numbers. We use the λ-calculus terminology and notations as in [1]: terms are considered modulo α*-equivalence*, i.e. variable renaming; we write FV(M) for the set of free variables of a term M. For any finite list of variables Γ = x1,...,x<sup>n</sup> we write Λ<sup>+</sup> <sup>Γ</sup> for the set of terms <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup> such that FV(M) ⊆ {x1,...,xn}. Given two terms M,N <sup>∈</sup> <sup>Λ</sup><sup>+</sup> and <sup>x</sup> ∈ V we write <sup>M</sup>{N/x} for the term obtained by substituting N for the free occurrences of x in M, subject to the usual proviso of renaming bound variables of M to avoid capture of free variables in N.

*Example 1.* Some terms useful in giving examples: the duplicator δ = λx.xx, the Turing fixed point combinator **Θ** = (λxy.y(xxy))(λxy.y(xxy)) and **Ω** = δδ.

A **context** C[ ] is a term containing a single occurrence of a distinguished variable denoted [ ] and called hole. A **head-context** is of the form E[]= λx<sup>1</sup> ...xn.[ ]M<sup>1</sup> ...Mk, for n, k <sup>≥</sup> 0 and <sup>M</sup><sup>i</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup>. Given <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup>, we write <sup>C</sup>[M] for the term obtained by replacing M for the hole in C[ ] possibly with capture of free variables. The **operational semantics** is given by a Markov chain over Λ<sup>+</sup>, mixing together the standard head-reduction of untyped λ-calculus with the probabilistic choice +p. Precisely, this system is given by the transition matrix Red in Eq. (2). It is well known that any Λ<sup>+</sup>-term M can be uniquely decomposed into E[R] for E[ ] a head-context and R either a β-redex, or a +p-redex (for some p ∈ [0, 1]) or a variable in V. This gives the following cases:

$$\text{Red}\_{E[R],N} ::= \begin{cases} 1 & \text{if } R = (\lambda x.M')M'' \text{ and } N = E[M' \{M'' / x]] \\ p & \text{if } R = M' +\_p M'', M' \neq M'' \text{ and } N = E[M'] \\ 1 - p & \text{if } R = M' +\_p M'', M' \neq M'' \text{ and } N = E[M''] \\ 1 & \text{if } R = M' +\_p M' \text{ and } N = E[M'] \\ 1 & \text{if } R \in \mathcal{V} \text{ and } N = E[R] \\ 0 & \text{otherwise} \end{cases} (2)$$

This matrix is stochastic, i.e. for any term M, <sup>N</sup> RedM,N = 1. A **head-normal form** is a term of the form E[y], with y ∈ V called its **head-variable**. We write HNF for the set of all head-normal forms. Following [5,6], we consider the headnormal forms as absorbing states of the process. Hence the n-th power Red<sup>n</sup> of the matrix Red describes the process of performing *exactly* n steps: Red<sup>n</sup> M,N is the probability that after n process steps M will reach state N.

*Example 2.* Let L = (x +<sup>p</sup> y), we have RedδL,LL = 1, and Red<sup>n</sup> δL,xL = p, Red<sup>n</sup> δL,yL = 1 − p for all n ≥ 2. In fact both xL and yL are head-normal forms, so absorbing states. The term **Ω** β-reduces to itself, so Red<sup>n</sup> Ω,Ω = 1 for any n, giving an example of absorbing state which is not a head-normal form.

The Turing fixed point combinator needs two β-steps to unfold its argument, so, for any term M, Red<sup>2</sup> **<sup>Θ</sup>**M,M(**Θ**M) = 1. In the case M is a probabilistic function like M = λf.(f +<sup>p</sup> y), we get Red<sup>4</sup><sup>n</sup> **<sup>Θ</sup>**M,**Θ**<sup>M</sup> = p<sup>n</sup> and Red<sup>4</sup><sup>n</sup> **<sup>Θ</sup>**M,y = 1 <sup>−</sup> <sup>p</sup><sup>n</sup>, for any n. In the case M = λf.(yf +<sup>p</sup> y), we get: Red4(n+1) **<sup>Θ</sup>**M,yn(**Θ**M) = p<sup>n</sup>+1 and Red4(n+1) **<sup>Θ</sup>**M,yn(y) = (1−p)p<sup>n</sup>, where <sup>y</sup><sup>n</sup>(...) denotes the n-fold application <sup>y</sup>(...y(...)).

Notice that for <sup>h</sup> <sup>∈</sup> HNF and <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup>, the sequence Red<sup>n</sup> M,h <sup>n</sup>∈<sup>N</sup> is monotone increasing and bounded by 1, so it converges. We define its limit by:

$$\forall M \in \Lambda^+, \forall h \in \text{HNF}, \text{Red}\_{M,h}^{\infty} ::= \sup\_{n \in \mathbb{N}} \{ \text{Red}\_{M,h}^n \} \in [0,1]. \tag{3}$$

This quantity gives the total probability of M to reduce to the head-normal form h in any number (possibly infinitely many) of finite reduction sequences.

*Example 3.* Recall the terms in Example 2. We have Red<sup>∞</sup> δL,xL = p and Red<sup>∞</sup> δL,yL = 1 <sup>−</sup> <sup>p</sup>. For any <sup>h</sup> <sup>∈</sup> HNF and <sup>n</sup> <sup>∈</sup> <sup>N</sup> we have Red<sup>n</sup> **<sup>Ω</sup>**,h = 0 so Red<sup>∞</sup> **<sup>Ω</sup>**,h = 0. The quantity Red<sup>∞</sup> **<sup>Θ</sup>**(λf.(f+py)),y is the first example of limit, being equal to 1 whereas Red<sup>n</sup> **<sup>Θ</sup>**(λf.(f+py)),y <sup>&</sup>lt; 1 for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>. Operationally this means that the term **Θ**(λf.(f +<sup>p</sup> y)) reduces to y with probability 1 but the length of these reductions is not bounded. Finally, Red<sup>∞</sup> **<sup>Θ</sup>**(λf.(yf+py)),yn(y) = (1 <sup>−</sup> <sup>p</sup>)p<sup>n</sup>, this means that **Θ**(λf.(yf +<sup>p</sup> y)) converges with probability 1 but it can reach infinitely many different head-normal forms.

Given M,N <sup>∈</sup> <sup>Λ</sup><sup>+</sup>, we say that <sup>M</sup> is **contextually equivalent** to <sup>N</sup> if, and only if, ∀C[ ], <sup>h</sup>∈HNF Red<sup>∞</sup> <sup>C</sup>[M],h = <sup>h</sup>∈HNF Red<sup>∞</sup> <sup>C</sup>[N],h.

An important property in the following is **extensionality**, meaning invariance under η-equivalence. The η**-equivalence** is the smallest congruence such that, for any <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup> and x /<sup>∈</sup> FV(M) we have <sup>M</sup> <sup>=</sup><sup>η</sup> λx.Mx. Notice that the contextual equivalence is extensional (see [1] for the classical λ-calculus).

#### **3 Probabilistic Coherence Spaces**

Girard introduced probabilistic coherence spaces (PCS) as a "quantitative refinement" of coherence spaces [9]. Danos and Ehrhard considered then the category **Pcoh** of linear and Scott-continuous functions between PCS as a model of linear logic and the cartesian closed category **Pcoh**! of entire functions between PCS as the Kleisli category associated with the comonad of **Pcoh** modelling the exponential modality [5]. They proved also that **Pcoh**! provides an adequate model of probabilistic PCF and the reflexive object D which is our object of study.

The two categories **Pcoh** and **Pcoh**! have been then studied in various papers. In particular, **Pcoh**! is proved to be fully abstract for the call-by-name probabilistic PCF [7]. This result has been also extended to richer languages, e.g. call-by-push-value probabilistic PCF [8]. The untyped model D is proven adequate for Λ<sup>+</sup> [6]. This paper is the continuation of the latter result, showing full abstraction for D as a consequence of a stronger form of adequacy.

We briefly recall here the cartesian closed category **Pcoh**! and the reflexive object D. Because of space we omit to consider the linear logic model **Pcoh**, from which **Pcoh**! is derived. We refer the reader to [5,6] for more details.

*Probabilistic coherence spaces and entire functions.* A **probabilistic coherence space**, or PCS for short, is a pair X = (|X | ,P(X )) where |X | is a countable set called the **web** of <sup>X</sup> and P(<sup>X</sup> ) is a subset of the semi-module (R≥<sup>0</sup>)|X | such that the following three conditions hold: (i) *closedness*: P(X ) ⊥⊥ = P(X ), where, given a set <sup>P</sup> <sup>⊆</sup> (R≥<sup>0</sup>)|X |, the **dual of** <sup>P</sup> is defined as <sup>P</sup> <sup>⊥</sup> ::= {<sup>y</sup> <sup>∈</sup> (R≥<sup>0</sup>)|X | <sup>|</sup> ∀x ∈ P <sup>a</sup>∈|X | <sup>x</sup>ay<sup>a</sup> <sup>≤</sup> <sup>1</sup>}; (ii) *boundedness*: <sup>∀</sup><sup>a</sup> ∈ |X |, <sup>∃</sup>μ > 0, <sup>∀</sup><sup>x</sup> <sup>∈</sup> P(<sup>X</sup> ), x<sup>a</sup> ≤ μ; (iii) *completeness*: ∀a ∈ |X |, ∃x ∈ P(X ), x<sup>a</sup> > 0.

Given x, y ∈ P(X ), we write x ≤ y for the order defined pointwise, i.e. for every a ∈ |X |, x<sup>a</sup> ≤ ya. The closedness condition is equivalent to require that P(X ) is convex and Scott-closed, as stated below.

**Proposition 1 (e.g.** [4]**).** *Given an index set* <sup>I</sup> *and a subset* <sup>P</sup> <sup>⊂</sup> (R≥<sup>0</sup>)<sup>I</sup> *which is bounded and complete, we have* P = P ⊥⊥ *iff the following two conditions hold: (i)* P *is convex, i.e. for every* x, y ∈ P *and* λ ∈ [0, 1]*,* λx + (1 − λ)y ∈ P*; (ii)* P *is Scott-closed, i.e. for every* x ≤ y ∈ P*,* x ∈ P *and for every increasing chain* {xi}<sup>i</sup>∈<sup>N</sup> ⊆ P*,* sup<sup>i</sup> x<sup>i</sup> ∈ P*.*

A data-type is denoted by a PCS X and its data by vectors in P(X ): convexity allows for probabilistic superposition and Scott-closedness for recursion.

*Example 4.* A simple example of PCS is U = (|U| ,P(U)) with |U| a singleton set and P(U) = [0, 1]. Notice P(U) <sup>⊥</sup> = P(U). This PCS gives the flat interpretation of the unit type in a typed language. The boolean type is denoted by the two dimensional PCS B ::= ({t, f}, {(ρt, ρf) | ρ<sup>t</sup> + ρ<sup>f</sup> ≤ 1}). Notice that P(B) can be seen as the set of the probabilistic sub-distributions of the boolean values.

As soon as one consider functional types, the intuitive notion of (discrete) sub-probabilistic distribution is lost. In particular, the reflexive object D defined below is an example of an infinite dimensional PCS where scalars arbitrarily big may appear in P(D). One can think of PCS's as a generalisation of the notion of discrete sub-probabilistic distributions allowing a cartesian closed category.

An **entire function from** <sup>X</sup> **to** <sup>Y</sup> is a matrix <sup>f</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> <sup>M</sup>f(|X |)×|Y| such that for any x ∈ P(X ), the image f(x) under f belongs to P(Y), where f(x) is

$$f(x) ::= \left(\sum\_{m \in \mathcal{M}(|\mathcal{X}|)} f\_{m,b} x^m \right)\_{b \in |\mathcal{Y}|} \qquad \text{where } x^m ::= \prod\_{a \in \text{Supp}(m)} x\_a^{m(a)} \tag{4}$$

Notice that the condition f(x) ∈ P(Y) requires that the possibly infinite sum in the previous equation must converge. Recently, Crubill´e proves that the entire maps can be characterised independently from their matrix representation as the absolutely monotonic and Scott-continuous maps between PCS's, see [3].

*The cartesian closed category.* The Kleisli category **Pcoh**! has PCS's as objects and entire maps as morphisms. Given f ∈ **Pcoh**!(X ,Y) and g ∈ **Pcoh**!(Y, Z), the **composition** g ◦ f is the usual functional composition, whose matrix can be explicitly given by, for m ∈ Mf(|X |), c ∈ |Z|:

$$f(g \circ f)\_{m,c} \coloneqq \sum\_{p \in \mathcal{M}\_{\tilde{\mathbb{A}}}[\mathcal{Y}]} g\_{p,c} f^{\{m,p\}} \qquad \text{where } f^{\{m,[b\_1,\dots,b\_n]\}} \coloneqq \sum\_{\substack{\{m\_1,\dots,m\_n\} \ i=1}} \prod\_{i=1}^n f\_{m\_i,b\_i} \tag{5}$$

The boundedness condition over Z and the completeness condition over X ensure that the possibly infinite sum over p ∈ Mf(|Y|) in Eq. (5) converges. The **identity** is the matrix id<sup>X</sup> m,a = δ[a],a, where δ is the Kronecker delta.

The **cartesian product** of any countable family (Xi)<sup>i</sup>∈<sup>I</sup> of PCS's is:

$$\begin{aligned} \left| \prod\_{i \in I} \mathcal{X}\_i \right| &:= \bigcup\_{i \in I} \{ i \} \times \left| \mathcal{X}\_i \right|, \\ \mathrm{P} \left( \prod\_{i \in I} \mathcal{X}\_i \right) &:= \{ x \in (\mathbb{R}\_{\geq 0})^{|\prod\_{i \in I} \mathcal{X}\_i|} \, \vert \, \forall i \in I, \pi\_i(x) \in \mathrm{P}(\mathcal{X}\_i) \}, \end{aligned} \tag{6}$$

where <sup>π</sup>i(x) is the vector in (R≥<sup>0</sup>)|Xi<sup>|</sup> denoting the <sup>i</sup>-th component of <sup>x</sup>, i.e. πi(x)<sup>a</sup> ::= x(i,a). This means that P <sup>i</sup>∈<sup>I</sup> X<sup>i</sup> can be seen as the set-theoretical product <sup>i</sup>∈<sup>I</sup> P(Xi), by mapping <sup>x</sup> <sup>∈</sup> <sup>P</sup> <sup>i</sup>∈<sup>I</sup> X<sup>i</sup> to the sequence (πi(x))<sup>i</sup>∈<sup>I</sup> . The <sup>j</sup>-th projection pr<sup>j</sup> <sup>∈</sup> **Pcoh**!( <sup>i</sup>∈<sup>I</sup> <sup>X</sup>i, <sup>X</sup><sup>j</sup> ) is defined by pr<sup>j</sup> m,b ::= δm,[(j,b)]. If all components of a product are equal to a PCS X we can use the exponential notation <sup>X</sup> <sup>I</sup> . Binary products can be written as X ×Y. In the following, we will often denote the finite multisets in M<sup>f</sup> <sup>i</sup>∈<sup>I</sup> X<sup>i</sup> as I-families of finite multisets almost everywhere empty, using the set-theoretical isomorphism:<sup>1</sup>

$$\mathcal{M}\_{\mathbf{f}}\left(\left|\prod\_{i\in I} \mathcal{X}\_{i}\right|\right) \quad \simeq \qquad \{\mathfrak{m} \in \prod\_{i\in I} \mathcal{M}\_{\mathbf{f}}(|\mathcal{X}\_{i}|) \; | \; \text{Supp}\,(\mathfrak{m}) \; \text{finite}\}.\tag{7}$$

For example, the multi-set [(0, a),(0, a ),(1, b)] ∈ Mf(|X × Y|) will be denoted as the pair ([a, a ], [b]), or the multiset [(2, a),(4, a ),(4, a)] ∈ M<sup>f</sup> <sup>n</sup>∈<sup>N</sup> X<sup>n</sup> as the almost everywhere empty sequence ([], [], [a], [], [a , a], [],...).

<sup>1</sup> In fact, this isomorphism corresponds, for I finite, to the fundamental exponential isomorphism !(A & B) -!A ⊗ !B of linear logic.

The **object of morphisms** from X to Y is **Pcoh**!(X ,Y) itself, i.e.:

$$\left|\mathcal{X}\Rightarrow\mathcal{Y}\right| ::= \mathcal{M}\_{\mathbf{f}}(\left|\mathcal{X}\right|) \times \left|\mathcal{Y}\right|, \quad \mathbf{P}(\mathcal{X}\Rightarrow\mathcal{Y}) ::= \mathbf{P}\mathbf{c}\mathbf{o}\mathbf{h}\_{!}(\mathcal{X},\mathcal{Y}).\tag{8}$$

The proof that P(X⇒Y) so defined enjoys the closedness, completeness and boundedness conditions of the definition of a PCS is not trivial and it is argued by the fact that **Pcoh**! is the Kleisli category associated with the exponential comonad of the linear logic model **Pcoh** mentioned in the introduction.

The **evaluation** Ev<sup>X</sup> ,<sup>Y</sup> <sup>∈</sup> **Pcoh**!((X⇒Y) × X ,Y) and the **curryfication** Cur<sup>X</sup> ,Z,<sup>Y</sup> (v) <sup>∈</sup> **Pcoh**!(Z, X⇒Y) of a morphism <sup>v</sup> <sup>∈</sup> **Pcoh**!(X ×Z,Y) are:

$$\operatorname{Ev}\_{\left(m,p\right),a}^{\mathcal{X},\mathcal{Y}} ::= \delta\_{m,\left[\left(p,a\right)\right]}, \qquad \operatorname{Cur}^{\mathcal{X},\mathcal{Z},\mathcal{Y}}\left(v\right)\_{m,\left(p,a\right)} ::= v\_{\left(p,m\right),a}. \tag{9}$$

*The reflexive object* D. We set X ⊆Y whenever |X | ⊆ |Y| and P(X ) = {v||X | s.t. <sup>v</sup> <sup>∈</sup> P(Y)}, where <sup>v</sup>||X | is the vector in <sup>R</sup>|X | <sup>≥</sup><sup>0</sup> obtained by restricting <sup>v</sup> <sup>∈</sup> <sup>R</sup>|Y| <sup>≥</sup><sup>0</sup> to the indexes in |X | ⊆ |Y|. This defines a complete order over PCS's. The model <sup>D</sup> of <sup>Λ</sup><sup>+</sup> is then given by the least fix-point of the Scottcontinuous functor <sup>X</sup> → X <sup>N</sup> ⇒ U (where <sup>U</sup> is the one-dimensional PCS defined in Example 4). We do not detail here its definition, but we give explicitly the chain <sup>D</sup><sup>0</sup> = (∅, **<sup>0</sup>**), <sup>D</sup>+1 <sup>=</sup> <sup>D</sup><sup>N</sup> ⇒ U whose (co)limit is the least fix-point D of <sup>X</sup> → X <sup>N</sup> ⇒ U by the Knaster-Tarski theorem. We refer to [5, Sect. 2] for details.

The webs of these spaces are given by:

$$|\mathcal{D}\_0| ::= \emptyset, \quad |\mathcal{D}\_{\ell+1}| ::= \mathcal{M}\_{\mathbf{f}}(|\mathcal{D}\_{\ell}|)^{(\omega)}, \quad |\mathcal{D}| ::= \bigcup\_{\ell \in \mathbb{N}} |\mathcal{D}\_{\ell}|\tag{10}$$

where Mf(|D|) (ω) denotes the set of infinite sequences of multisets of |D<sup>|</sup> that are almost everywhere empty (notice we are using the isomorphism mentioned in Eq. (7)). The set |D1| is the singleton containing the infinite sequence ([],[],[]. . . ) of empty multisets, which we denote by . Given a multiset m ∈ Mf(|D|) and a sequence d ∈ Mf(|D+1|), we denote by m :: d the element of |D+1| having at first position m and then all the multisets of d shifted by one position. Notice that any element of |D+1| can be written as m<sup>1</sup> :: ...m<sup>n</sup> :: for an n sufficiently large and m1,...,m<sup>n</sup> ∈ Mf(|D|). In particular, [] :: = . 2

The sets of vectors P(D) and P(D) completing the definition of a PCS are:

$$\begin{aligned} \mathbf{P}(\mathcal{D}\_{0}) &:= \mathbf{0} \\ \mathbf{P}(\mathcal{D}\_{\ell+1}) &:= \left\{ v \in (\mathbb{R}\_{\geq 0})^{|\mathcal{D}\_{\ell+1}|} \text{ s.t. } \sum\_{\begin{subarray}{c} m\_{1}, \ldots, m\_{n} \in \mathsf{P} \\ \mathsf{M}\_{\ell}(\mathcal{D}\_{\ell}) \end{subarray}} v\_{m\_{1}; \ldots, m\_{n}; \mathsf{A}} u\_{1}^{m\_{1}} \ldots u\_{n}^{m\_{n}} \le 1 \right\} \\ \mathbf{P}(\mathcal{D}) &:= \left\{ v \in (\mathbb{R}\_{\geq 0})^{|\mathcal{D}|} \text{ s.t. } \forall \ell \in \mathbb{N}, v|\_{|\mathcal{D}\_{\ell}|} \in \mathbf{P}(\mathcal{D}\_{\ell}) \right\} \end{aligned} (11)$$

The above definition of P(D+1) is actually equivalent to the standard one inferred from the definition of the countable product <sup>D</sup><sup>N</sup>, which would require

<sup>2</sup> The elements of |D| can be seen as intersection types generated from the constant -, the :: operation being the arrow and multisets non-idempotent intersections.

$$\begin{aligned} \lbrack x \rbrack\_{m,d}^{\Gamma} &= \begin{cases} 1 & \boldsymbol{m}\_{x} = [d] \text{ and } \forall y \in \Gamma \backslash x, \boldsymbol{m}\_{y} = [], \\ 0 & \text{otherwise} \end{cases}, \\\\ \llbracket \lambda x.M \rrbracket\_{m,m:d}^{\Gamma} &= [M]^{\tfrac{x}{x},\Gamma}\_{(m,m),d}, \\ \llbracket MN \rrbracket\_{m,d}^{\Gamma} &= \sum\_{m \in \mathcal{M}\_{\emptyset}(\mathcal{D}) \backslash} \sum\_{\begin{subarray}{c} (m\_{1},m\_{2}) \text{ s.t.} \\ \forall x \in \Gamma, m\_{x} = m\_{1x} \psi m\_{2x} \end{subarray}} [M]\_{m\_{1},m:d}^{\Gamma} [[N]^{\Gamma}]^{m\_{2},m}, \\\\ \llbracket M +\_{p} N \rrbracket\_{m,d}^{\Gamma} &= p [M]\_{m,d}^{\Gamma} + (1-p) [N]\_{m,d}^{\Gamma}. \end{aligned}$$

**Fig. 1.** Explicit definition of the denotation of a term in <sup>Λ</sup><sup>+</sup> <sup>Γ</sup> as a matrix in P- <sup>D</sup><sup>Γ</sup> ⇒ D . Recall Eq. (5) for the notation (-N Γ ) *<sup>m</sup>* <sup>2</sup>,m.

to apply v to a countable family (ui)<sup>i</sup>∈<sup>N</sup> of vectors in P(D). The two definitions are equivalent because of the continuity of the scalar multiplication and the sum.

It happens that any solution of <sup>X</sup> <sup>=</sup> <sup>X</sup> <sup>N</sup> ⇒ U gives also a solution (although not minimal) to X = X⇒X and hence a reflexive object of **Pcoh**!. The isomorphism pair λ ∈ **Pcoh**!(D⇒D, D) and app ∈ **Pcoh**!(D, D⇒D) is given by, for any p ∈ Mf(|D ⇒ D|), m, q ∈ Mf(|D|), and d ∈ |D|,

$$
\lambda\_{p,m:d} := \delta\_{p, [(m,d)]}, \qquad \qquad \mathbf{app}\_{q,(m,d)} := \delta\_{q, [m:d]}.\tag{12}
$$

It is easy to check that app ◦ λ = idD⇒D and λ ◦ app = idD, so (D, λ, app) yields an extensional model of untyped λ-calculus, i.e. -M = -N whenever M =<sup>η</sup> N.

*Interpretation of the Terms of* Λ<sup>+</sup>. Given a term M and a list Γ of pairwise different variables containing FV(M), the interpretation of M is a morphism -<sup>M</sup><sup>Γ</sup> <sup>∈</sup> **Pcoh**!(D<sup>Γ</sup> , <sup>D</sup>), i.e. a matrix in <sup>R</sup><sup>M</sup>f(|D<sup>Γ</sup> <sup>|</sup>)×|D| <sup>≥</sup><sup>0</sup> <sup>=</sup> <sup>R</sup><sup>M</sup>f(|D|)<sup>Γ</sup> ×|D| <sup>≥</sup><sup>0</sup> . The definition of -M<sup>Γ</sup> is the standard one determined by the cartesian closed structure of **Pcoh**! and the reflexive object (D, λ, app): x<sup>Γ</sup> is the x-th projection of the product <sup>D</sup><sup>Γ</sup> , λx.M<sup>Γ</sup> <sup>=</sup> <sup>λ</sup> ◦ Cur -Mx,Γ and -MN<sup>Γ</sup> = Ev ◦ app ◦ -M<sup>Γ</sup> , -<sup>N</sup><sup>Γ</sup> , where , is the cartesian product of two morphisms. Figure 1 makes explicit the coefficients of the matrix -M<sup>Γ</sup> by structural induction on M. The only nonstandard operation is the barycentric sum -M +<sup>p</sup> N which is still a morphism of **Pcoh**! by the convexity of P <sup>D</sup><sup>Γ</sup> ⇒ D (Proposition 1).

**Proposition 2 (Soundness,** [5,6]**).** *For every term* <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup> *and sequence* Γ ⊇ FV(M)*:* -M<sup>Γ</sup> = <sup>N</sup>∈Λ<sup>+</sup> RedM,N -N<sup>Γ</sup> .

#### **4 Strong Adequacy**

In this section we state and prove Theorem 1, enhancing the **Pcoh**! adequacy property given in [6]. This latter explains the computational meaning of the mass of -M restricted to D<sup>2</sup> ⊆ D, while our generalisation considers the whole -M, showing that it encodes the way the operational semantics dispatches the mass into the denotation of the head-normal forms. As in [6], the proof of Theorem 1 adapts a method introduced by Pitts [14], consisting in building a recursively specified relation of formal approximation - (Proposition 3) which satisfies the same recursive equation as D. However, our generalisation requires a subtler definition of with respect to [6]. In particular, we must consider open terms in order to prove Lemma 7.

*The approximation relation.* Let us introduce some convenient notation, extending the definition of λ-abstraction and application to general morphisms.

**Definition 1.** *Given* v ∈ P <sup>D</sup>x,Γ ⇒ D *, let* Λ(v) *be the vector* λ ◦ Cur(v) ∈ P <sup>D</sup><sup>Γ</sup> ⇒ D *. Given* v, u ∈ P <sup>D</sup><sup>Γ</sup> ⇒ D *let* v @ u *be the vector* Ev ◦ app ◦ v, u ∈ P <sup>D</sup><sup>Γ</sup> ⇒ D *. Finally, given a finite sequence* u1,...,u<sup>n</sup> ∈ P <sup>D</sup><sup>Γ</sup> ⇒ D *, for* n ∈ N*, we denote by* v @ u<sup>1</sup> ...u<sup>n</sup> *the vector* (v @ u1) @ ...un*.*

**Lemma 1.** *The map* v → Λ(v) *is linear, i.e. for any vectors* v, v *and scalars* p, p ∈ [0, 1] *such that* p + p ≤ 1*, we have* Λ(pv + p v ) = pΛ(v) + p Λ(v )*, and Scott-continuous, i.e. for any countable increasing chain* (vn)<sup>n</sup>∈<sup>N</sup>*,* Λ(supn(vn)) = supn(Λ(vn))*. The map* (v, u1,...,un) → v @ u<sup>1</sup> ...u<sup>n</sup> *is Scott-continuous on all of its arguments but linear only on its first argument* v*.*

*Proof.* Scott-continuity is because the scalar multiplication and the sum are Scott-continuous. The linearity is because the matrices app, λ are associated with linear maps (namely, they have non-zero coefficients only on singleton multisets, see (12)) as well as the left-most component of Ev, see (9).

For any Γ ⊆ Δ there exists the projection pr : P(D) <sup>Δ</sup> <sup>→</sup> P(D) <sup>Γ</sup> . Then, given a matrix v ∈ P <sup>D</sup><sup>Γ</sup> ⇒ D we denote by <sup>v</sup> <sup>↑</sup><sup>Δ</sup><sup>∈</sup> <sup>P</sup> <sup>D</sup><sup>Δ</sup> ⇒ D the matrix corresponding to the pre-composition of the morphism associated with v with pr. This can be explicitly defined by, for *m* ∈ Mf(|D|) <sup>Δ</sup>, <sup>d</sup> ∈ |D|, <sup>v</sup>↑<sup>Δ</sup> *<sup>m</sup>*,d = v(*m*x)x∈<sup>Γ</sup> ,d if ∀y ∈ Δ \ Γ,*m*<sup>y</sup> = [], and <sup>v</sup>↑<sup>Δ</sup> *<sup>m</sup>*,d = 0 otherwise.

We define an operation φ acting on the relations R ⊆ Γ - P - <sup>D</sup><sup>Γ</sup> ⇒ D <sup>×</sup> <sup>Λ</sup><sup>+</sup> Γ . Each component <sup>φ</sup><sup>Γ</sup> (R) <sup>⊆</sup> P <sup>D</sup><sup>Γ</sup> ⇒ D <sup>×</sup> <sup>Λ</sup><sup>+</sup> <sup>Γ</sup> is given by:

$$\begin{array}{c} (v,M) \in \phi^{\Gamma}(R) \text{ iff } \forall \Delta \supseteq I, \forall n \in \mathbb{N}, \forall u\_{1}, \dots, u\_{n} \in \mathcal{P}(\mathcal{D}^{\Delta} \Rightarrow \mathcal{D})\\ \qquad \forall N\_{1}, \dots, N\_{n} \in \Lambda\_{\Delta}^{+}, \text{ s.t. } (u\_{i}, N\_{i}) \in R \text{ for all } i \le n,\\ \qquad v \upharpoonright^{\Delta}@\!\!\!\/ u\_{1} \dots u\_{n} \leq \sum\_{h \in \text{HNF}\_{\Delta}} \text{Red}\_{M}^{\infty}{}\_{N\_{1} \dots N\_{n}, h} \left[h\right]^{\Delta}. \end{array} \tag{13}$$

The above definition is similar to Eq. (11), giving D+1 from D. In the following we look for a fixed-point of φ (Proposition 3). Its quest is not simple because φ is not monotone. We derive then from φ a monotone operator ψ on a larger space, and we compute its fixed-point by using Tarski's Theorem (Lemma 3).

Given (R<sup>+</sup>, R−) ∈ P Γ P <sup>D</sup><sup>Γ</sup> ⇒ D <sup>×</sup> <sup>Λ</sup><sup>+</sup> Γ 2 , we define ψ(R<sup>+</sup>, R−) = (φ(R−), φ(R<sup>+</sup>)). Given two such pairs (R<sup>+</sup> <sup>1</sup> , R<sup>−</sup> <sup>1</sup> ),(R<sup>+</sup> <sup>2</sup> , R<sup>−</sup> <sup>2</sup> ), we define (R<sup>+</sup> <sup>1</sup> , R<sup>−</sup> 1 ) (R<sup>+</sup> <sup>2</sup> , R<sup>−</sup> <sup>2</sup> ) iff R<sup>+</sup> <sup>1</sup> <sup>⊆</sup> <sup>R</sup><sup>+</sup> <sup>2</sup> and R<sup>−</sup> <sup>1</sup> ⊇ R<sup>−</sup> 2 .

**Lemma 2.** *The relation is an order relation giving a complete lattice on* P Γ P <sup>D</sup><sup>Γ</sup> ⇒ D <sup>×</sup> <sup>Λ</sup><sup>+</sup> Γ 2 *.*

Thanks to the previous lemma, we set (-<sup>+</sup>, -<sup>−</sup>) as the glb of the set {(R<sup>+</sup>, R−) <sup>|</sup> <sup>ψ</sup>(R<sup>+</sup>, R−) (R<sup>+</sup>, R−)} of the pre-fixed points of <sup>ψ</sup>.

**Lemma 3.** ψ(-<sup>+</sup>, -−)=(-<sup>+</sup>, -<sup>−</sup>)*, so* -<sup>+</sup> = φ(-<sup>−</sup>) *and* -<sup>−</sup> = φ(-<sup>+</sup>)*.*

*Proof.* One can check that ψ is monotone increasing wrt , so the result follows from Tarski's Theorem on fixed points.

**Lemma 4.** *For any* R ⊆ Γ P <sup>D</sup><sup>Γ</sup> ⇒ D <sup>×</sup> <sup>Λ</sup><sup>+</sup> Γ *and* <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup> <sup>Γ</sup> *, the set* {v ∈ P <sup>D</sup><sup>Γ</sup> ⇒ D <sup>|</sup> (v,M) <sup>∈</sup> <sup>φ</sup><sup>Γ</sup> (R)} *contains* <sup>0</sup>*, is downward closed and chain closed.*

*Proof.* Consequence of the fact that the application v @ u<sup>1</sup> ...u<sup>n</sup> and the lifting <sup>v</sup>↑<sup>Δ</sup> are Scott-continuous (Lemma 1). Also, <sup>v</sup>↑<sup>Δ</sup> is linear as well as <sup>v</sup> @ <sup>u</sup><sup>1</sup> ...u<sup>n</sup> on its left argument <sup>v</sup> (always Lemma 1), so 0↑<sup>Δ</sup> @ <sup>u</sup><sup>1</sup> ...u<sup>n</sup> = 0.

**Proposition 3.** *We have* -<sup>+</sup> = -<sup>−</sup>*. From now on we denote it simply by* -*. We note* -<sup>Γ</sup> *its component on* P <sup>D</sup><sup>Γ</sup> ⇒ D <sup>×</sup> <sup>Λ</sup><sup>+</sup> Γ *.*

*Proof.* First (-<sup>−</sup>, -<sup>+</sup>) is a (pre-)fixed point of ψ so (-<sup>+</sup>, -<sup>−</sup>) (-<sup>−</sup>, -<sup>+</sup>), i.e. -<sup>+</sup> <sup>⊆</sup> -<sup>−</sup>. To prove the converse, we reason by induction on |D|. For v ∈ P <sup>D</sup><sup>Γ</sup> ⇒ D and  <sup>∈</sup> <sup>N</sup>, we note <sup>v</sup>| its restriction to <sup>D</sup><sup>Γ</sup> ⇒ D , i.e.: (v|)*m*,d <sup>=</sup> v*m*,d if d ∈ |D|, and (v|)*m*,d = 0 otherwise. Notice that v| is a morphism P <sup>D</sup><sup>Γ</sup> ⇒ D , since v| ≤ v ∈ P <sup>D</sup><sup>Γ</sup> ⇒ D . We prove by induction on  that:

$$\forall v \in \mathcal{P}\left(\mathcal{D}^{\Gamma} \Rightarrow \mathcal{D}\right), \forall M \in A\_{\Gamma}^{+}, (v, M) \in \lhd^{-} \text{ implies } (v\_{|\ell}, M) \in \lhd^{+}.$$

For  = 0 we have v|<sup>0</sup> = 0 so by Lemma 4 (v|0, M) ∈ -<sup>+</sup> = φ(-<sup>−</sup>). At level  + 1 we want to prove (v|+1, M) ∈ -<sup>+</sup> = φ(-<sup>−</sup>). Let Δ ⊇ Γ, u1,...,u<sup>n</sup> ∈ P <sup>D</sup><sup>Δ</sup> ⇒ D , <sup>N</sup>1,...,N<sup>n</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup> <sup>Δ</sup> such that for all i ≤ n, (ui, Ni) ∈ -−. By induction hypothesis we have ((ui)|, Ni) ∈ -<sup>+</sup> for all <sup>i</sup> <sup>≤</sup> <sup>n</sup>. Besides by hypothesis (v,M) ∈ -<sup>−</sup> = φ(-<sup>+</sup>) and we have <sup>v</sup>|+1 <sup>≤</sup> <sup>v</sup> so Lemma <sup>4</sup> gives (v|+1, M) <sup>∈</sup> φ(-<sup>+</sup>). Hence <sup>v</sup>|+1↑<sup>Δ</sup> @ (u1)| ...(un)| <sup>≤</sup> <sup>h</sup>∈HNF<sup>Δ</sup> Red<sup>∞</sup> MN1...Nn,hh<sup>Δ</sup>. We conclude by observing that <sup>v</sup>|+1↑<sup>Δ</sup> @ (u1)| ...(un)| <sup>=</sup> <sup>v</sup>|+1↑<sup>Δ</sup> @ <sup>u</sup><sup>1</sup> ...un.

Now if (v,M) ∈ -<sup>−</sup> then for all  <sup>∈</sup> <sup>N</sup>, (v|, M) <sup>∈</sup> -<sup>+</sup>, but we have v = sup∈<sup>N</sup> v| so Lemma 4 gives (v,M) ∈ -<sup>+</sup>.

*The key lemma.* Lemma 9 is the so-called key-lemma for the relation -. The reasoning is standard, except for the proof of Lemma 8 allowing strong adequacy.

#### **Lemma 5.** *For* <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup> x,Γ , N <sup>∈</sup> <sup>Λ</sup><sup>+</sup> <sup>Γ</sup> *,* (v,(λx.M)N)∈-<sup>Γ</sup> *iff* (v,M{N/x})∈-Γ *.*

*Proof.* Observe that for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, <sup>N</sup>1,...,N<sup>n</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup> and <sup>h</sup> <sup>∈</sup> HNF we have Red<sup>∞</sup> (λx.M)NN1...Nn,h = Red<sup>∞</sup> <sup>M</sup>{N/x}N1...Nn,h.

**Lemma 6.** *Let* (v,M) *and* (r, L) *in* -<sup>Γ</sup> *, then* (pv + (1 <sup>−</sup> <sup>p</sup>)r, M <sup>+</sup><sup>p</sup> <sup>L</sup>) <sup>∈</sup> -Γ *.* *Proof.* Simply observe that for all <sup>h</sup> <sup>∈</sup> HNF and <sup>N</sup>1,...,N<sup>n</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup> we have Red∞ (M+pL)N1...Nn,h = pRed<sup>∞</sup> MN1...Nn,h + (1 − p)Red<sup>∞</sup> LN1...Nn,h.

**Lemma 7.** *For all* <sup>x</sup> <sup>∈</sup> <sup>Γ</sup>*,* (pr<sup>Γ</sup> <sup>x</sup> , x) ∈ -Γ *.*

*Proof.* Given any <sup>Δ</sup> <sup>⊇</sup> <sup>Γ</sup>, <sup>n</sup> <sup>∈</sup> <sup>N</sup> and (u1, N1),...,(un, Nn) <sup>∈</sup> -<sup>Δ</sup>, we have:

$$\sum\_{h \in \text{HNF}\_{\Delta}} \text{Red}\_{xN\_1 \dots N\_n, h}^{\infty} \lbrack h \rbrack^{\Delta} = \lbrack xN\_1 \dots N\_n \rbrack^{\Delta} = \text{pr}\_x^{\Delta} \otimes \lbrack N\_1 \rrbracket^{\Delta} \dots \lbrack N\_n \rrbracket^{\Delta} \rbrack$$

Besides for all i ≤ n, as (ui, Ni) ∈ -<sup>Δ</sup> we have <sup>u</sup><sup>i</sup> <sup>≤</sup> <sup>h</sup>∈HNF<sup>Δ</sup> Red<sup>∞</sup> <sup>N</sup>i,h<sup>h</sup><sup>Δ</sup> <sup>≤</sup> -Ni<sup>Δ</sup> . The latter inequality is because Proposition <sup>2</sup> implies that for all <sup>k</sup> <sup>∈</sup> <sup>N</sup>, <sup>h</sup>∈HNF<sup>Δ</sup> Red<sup>k</sup> <sup>N</sup>i,hh ≤ -Ni. The application @ being increasing in both its arguments we have pr<sup>Γ</sup> <sup>x</sup> <sup>↑</sup><sup>Δ</sup> @ <sup>u</sup><sup>1</sup> ...u<sup>n</sup> <sup>≤</sup> pr<sup>Δ</sup> <sup>x</sup> @ -N1<sup>Δ</sup> ... -<sup>N</sup>n<sup>Δ</sup>.

**Lemma 8.** *Let* (v,M) ∈ P <sup>D</sup><sup>Γ</sup> ⇒ D <sup>×</sup> <sup>Λ</sup><sup>+</sup> <sup>Γ</sup> *, we have* (v,M) ∈ -<sup>Γ</sup> *iff for all* (r, L) ∈ -<sup>Δ</sup> *with* <sup>Δ</sup> <sup>⊇</sup> <sup>Γ</sup>*,* (v↑<sup>Δ</sup> @ r, ML) <sup>∈</sup> -Δ*.*

*Proof.* If (v,M) ∈ -<sup>Γ</sup> = φ<sup>Γ</sup> (-) and (r, L) ∈ -<sup>Δ</sup> then using the definition of φ it is easy to check that (v↑<sup>Δ</sup> @ r, ML) <sup>∈</sup> -<sup>Δ</sup>. Conversely if for all (r, L) <sup>∈</sup> -Δ we have (v↑<sup>Δ</sup> @ r, ML) <sup>∈</sup> -<sup>Δ</sup> and we want to prove that (v,M) <sup>∈</sup> <sup>φ</sup><sup>Γ</sup> (-) then the conditions of Eq. (13) trivially holds whenever n ≥ 1, so we need to consider only the case for n = 0.

Suppose that for all (r, L) ∈ -<sup>Δ</sup>, (v↑<sup>Δ</sup> @ r, ML) <sup>∈</sup> -<sup>Δ</sup>, let us prove that v ≤ <sup>h</sup>∈HNF<sup>Γ</sup> Red<sup>∞</sup> M,hh<sup>Γ</sup> . Let x be a fresh variable, according to Lemma 7 we have (prx,Γ <sup>x</sup> , x) ∈ x,Γ so <sup>v</sup>↑x,Γ @ prx,Γ <sup>x</sup> ≤ <sup>h</sup>∈HNFx,Γ Red<sup>∞</sup> Mx,hhx,Γ . Then:

$$\begin{split} v &= A(v \| ^{x, \Gamma} \otimes \mathrm{pr}\_{x}^{x, \Gamma}) & \text{extensionsality of } \mathcal{D} \\ &\leq A(\sum\_{h \in \mathrm{HNF}\_{x, \Gamma}} \mathrm{Red}\_{Mx, h}^{\infty} \| h \| ^{x, \Gamma}) & \text{monotonicity } A(\cdot), \text{Lemma 1} \\ &= \sum\_{h \in \mathrm{HNF}\_{x, \Gamma}} \mathrm{Red}\_{Mx, h}^{\infty} A(\| h \| ^{x, \Gamma}) & \text{linearity and continu} \\ &= \sum\_{h \in \mathrm{HNF}\_{x, \Gamma}} \mathrm{Red}\_{Mx, h}^{\infty} \| \lambda x. h \| ^{\Gamma} & \text{def. of } A(\cdot). \end{split}$$

One can check that for h ∈ HNFx,Γ , Red<sup>∞</sup> Mx,h = <sup>h</sup>0∈HNF<sup>Γ</sup> Red<sup>∞</sup> M,h0Red<sup>∞</sup> h0x,h (recall that x is not free in M). If h<sup>0</sup> is a head-normal form yP<sup>1</sup> ...P<sup>m</sup> then Red<sup>∞</sup> <sup>h</sup>0x,h = 0 only if h = yP<sup>1</sup> ...Pmx with x /∈ FV(yP<sup>1</sup> ...Pm) (and Red<sup>∞</sup> <sup>h</sup>0x,h = 1). If h<sup>0</sup> = λx0.h then Red<sup>∞</sup> <sup>h</sup>0x,h = 0 only if h = h {x/x0} (and Red<sup>∞</sup> <sup>h</sup>0x,h = 1). In the first case we have λx.h<sup>Γ</sup> = λx.(h0x)<sup>Γ</sup> = h0<sup>Γ</sup> . In the second case we have λx.h = h<sup>0</sup> modulo α-equivalence and λx.h<sup>Γ</sup> = h0<sup>Γ</sup> . Therefore: v ≤ <sup>h</sup>0∈HNF<sup>Γ</sup> Red<sup>∞</sup> M,h<sup>0</sup> <sup>h</sup>0<sup>Γ</sup> .

**Lemma 9 (Key Lemma).** *For all* <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup> <sup>Γ</sup> *with* Γ = {y1,...,yn}*, for all* Δ ⊇ Γ*, for all* u1*,. . . ,*u<sup>n</sup> *in* P <sup>D</sup><sup>Δ</sup> ⇒ D *and* <sup>N</sup>1*,. . . ,*N<sup>n</sup> *in* <sup>Λ</sup><sup>+</sup> <sup>Δ</sup> *with* (ui, Ni) ∈ -Δ*,*

$$\left[M\right]^{\Gamma} \circ \left(u\_1, \ldots, u\_n\right) \prec^{\Delta} M\{N\_1/y\_1, \ldots, N\_n/y\_n\}^{\Gamma}$$

*Proof.* The proof is by induction on M. The abstraction uses Lemmas 5 and 8, the application uses Lemma 8 and the barycentric sum Lemma 6.

**Theorem 1 (Strong adequacy).** *For all* <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup> <sup>Γ</sup> *we have:*

$$\mathbb{I}[M]^\Gamma = \sum\_{h \in \mathrm{HNF}\_{\Gamma}} \mathrm{Red}\_{M,h}^\infty \|h\|^\Gamma.$$

*Proof.* The invariance of the interpretation by reduction (Proposition 2) gives that for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>, -M<sup>Γ</sup> = <sup>N</sup>∈Λ<sup>+</sup> Γ Red<sup>n</sup> M,N -<sup>N</sup><sup>Γ</sup> <sup>≥</sup> <sup>h</sup>∈HNF<sup>Γ</sup> Red<sup>n</sup>h<sup>Γ</sup> . When n → ∞ we get -<sup>M</sup><sup>Γ</sup> <sup>≥</sup> <sup>h</sup>∈HNF<sup>Γ</sup> Red<sup>∞</sup> M,hh<sup>Γ</sup> .

Conversely using Lemma 9 with Δ = Γ and (ui, Ni)=(π<sup>Γ</sup> <sup>y</sup><sup>i</sup> , yi), which is in -<sup>Γ</sup> thanks to Lemma 7, we get (-<sup>M</sup><sup>Γ</sup> , M) <sup>∈</sup> -<sup>Γ</sup> . The definition of - = φ(-) with Δ = Γ and n = 0 gives -<sup>M</sup><sup>Γ</sup> <sup>≤</sup> <sup>h</sup>∈HNF<sup>Γ</sup> Red<sup>∞</sup> M,h<sup>h</sup><sup>Γ</sup> .

#### **5 Nakajima Trees and Full Abstraction**

We apply our strong adequacy to infer full abstraction (Theorem 2). As mentioned in the Introduction, the bridge linking syntax and semantics is given by the notion of probabilistic Nakajima tree defined by Leventis [12] (here Definitions 2 and 3) in order to prove a separation theorem for Λ<sup>+</sup>. Lemma 11 shows that the equality of Nakajima trees implies the denotational equality. The proof of this lemma uses the strong adequacy property.

**Definition 2.** *The set* PT <sup>η</sup> *of Nakajima trees with depth at most* <sup>∈</sup> <sup>N</sup> *is the set of subprobability distributions over value Nakajima trees* VT <sup>η</sup> *. These sets are defined by mutual recursion as follows:*

$$\begin{aligned} \mathcal{V}\mathcal{T}\_0^\eta &= \emptyset, \qquad \mathcal{V}\mathcal{T}\_{\ell+1}^\eta = \left\{\lambda x.y\,\mathcal{T} \mid x \in \mathcal{V}^\mathbb{N}, y \in \mathcal{V}, \mathcal{T} \in (\mathcal{P}\mathcal{T}\_\ell^\eta)^\mathbb{N}\right\}, \\\\ \mathcal{P}\mathcal{T}\_0^\eta &= \{\perp\}, \qquad \mathcal{P}\mathcal{T}\_{\ell+1}^\eta = \left\{\mathcal{T} \in [0,1]^{\mathcal{V}\mathcal{T}\_{\ell+1}^\eta} \mid \sum\_{t \in \mathcal{V}\mathcal{T}\_{\ell+1}^\eta} T(t) \le 1\right\}. \end{aligned}$$

The notation ⊥ represents the empty function (i.e. the distribution with empty support), encoding undefinedness and allowing directed sets of approximants.

Value Nakajima trees represent infinitary η-long head-normal forms: up to η-equivalence every head-normal form h = λx<sup>1</sup> ...xn.y M<sup>1</sup> ... M<sup>m</sup> is equal to λx<sup>1</sup> ...xn+k.y M<sup>1</sup> ... M<sup>m</sup> <sup>x</sup>n+1 ... xn+<sup>k</sup> for any <sup>k</sup> <sup>∈</sup> <sup>N</sup> and <sup>x</sup>n+1,. . . ,xn+<sup>k</sup> fresh, and value Nakajima trees are infinitary variants of such η-expansions.

**Definition 3.** *By mutual recursion we associate value trees VT*<sup>η</sup> *with headnormal forms and general trees PT*<sup>η</sup> *with general* Λ<sup>+</sup> *terms:*

$$\begin{aligned} VT\_{\ell+1}^{\eta}(\lambda x\_1 \dots x\_n.y \, M\_1 \dots \dots M\_m) \\ = \lambda x\_1 \dots x\_n x\_{n+1} \dots \dots y \, PT\_{\ell}^{\eta}(M\_1) \dots \, PT\_{\ell}^{\eta}(M\_m) \, PT\_{\ell}^{\eta}(x\_{n+1}) \dots \dots \end{aligned}$$

*where the* xi*s are pairwise distinct variables and, for* i>m*, the* xi*'s are fresh;*

$$PT\_0^\eta(M) = \bot, \qquad PT\_{\ell+1}^\eta(M) = t \mapsto \sum\_{h \in (VT\_{\ell+1}^\eta)^{-1}(t)} \text{Red}\_{M,h}^\infty$$

*Remark 1.* In [12], following the definition of deterministic Nakajima trees in [1], the value tree *VT*<sup>η</sup> +1(λx<sup>1</sup> ...xn.y M<sup>1</sup> ... Mm) includes explicitly the difference n − m. This yields a heavier but somewhat more convenient definition, as then Lemma 10 also holds for  = 1. In this paper we chose to use the lighter definition. This choice does not influence the Nakajima tree equality by Lemma 10.

*Example 5.* Figure 2(a) depicts some examples of value Nakajima trees associated with the head-normal form λx1.y(**Ω**x1)x1. Notice that these trees are equivalent to the Nakajima trees associated with y(**Ω**x1) as well as y**Ω**. In fact, all these terms are contextually equivalent.

Figure 2(b) shows the Nakajima tree of depth 2 associated with the term y(u +<sup>q</sup> v) +<sup>p</sup> (y +<sup>p</sup> **Ω**). Notice that the two sums +<sup>p</sup> and +<sup>p</sup> contribute to the same subprobability distribution, whereas they are kept distinct from the sum +<sup>q</sup> on the argument side of an application.

Figure 2(c) gives some examples of the Nakajima trees associated with the term **Θ**(λf.(y+<sup>p</sup> y(f)), discussed also in Examples 2 and 3. Notice that the more the depth  increases, the more the top-level distribution's support grows.

It is clear that the family PT <sup>η</sup> (M) ∈<sup>N</sup> converges to a limit, but we do not need to make it explicit for our purposes, so we avoid defining the topology over PT <sup>η</sup> yielding the convergence of PT <sup>η</sup> (M) ∈<sup>N</sup>.

The next lemma shows that the first levels of a *VT*<sup>η</sup> of a head-normal form h give a lot of information about the shape of h.

**Lemma 10.** *Given two head-normal forms* h = λx<sup>1</sup> ...xn.yM<sup>1</sup> ...M<sup>m</sup> *and* h = λx<sup>1</sup> ...x<sup>n</sup> .y M <sup>1</sup> ...M <sup>m</sup> *and any* <sup>≥</sup> <sup>2</sup>*, if VT*<sup>η</sup> (h) = *VT*<sup>η</sup> (h )*, then* y = y *and* n − m = n − m *.*

*Proof.* The fact y = y follows immediately from the definition of *VT*<sup>η</sup>. Concerning the second equality, one can assume n = n by η-expanding one of the two terms, in fact *VT*<sup>η</sup> is invariant under η-expansion. Modulo α-equivalence, we can then restrict ourselves to consider the case of h = λx<sup>1</sup> ...xn.yM<sup>1</sup> ...M<sup>m</sup> and h = λx<sup>1</sup> ...xn.yM <sup>1</sup> ...M <sup>m</sup> .

Suppose, by the sake of contradiction, that m>m . Then we should have *PT*<sup>η</sup> −<sup>1</sup>(M<sup>m</sup>+1) = *PT*<sup>η</sup> −<sup>1</sup>(xn+1), where <sup>x</sup>n+1 is a fresh variable, in particular <sup>x</sup>n+1 <sup>∈</sup>/ FV(M<sup>m</sup>+1). Since <sup>−</sup><sup>1</sup> <sup>&</sup>gt; 0, we have that *PT*<sup>η</sup> −<sup>1</sup>(xn+1)(t) = 1 only if <sup>t</sup> is equal to λz1z<sup>2</sup> . . . .xn+1*PT*<sup>η</sup> −<sup>2</sup>(z1)*PT*<sup>η</sup> −<sup>2</sup>(z2)... , otherwise *PT*<sup>η</sup> −<sup>1</sup>(xn+1)(t) = 0. So, *PT*<sup>η</sup> −<sup>1</sup>(M<sup>m</sup>+1) = *PT*<sup>η</sup> −<sup>1</sup>(xn+1) implies that Red<sup>∞</sup> <sup>M</sup>m+1,h > 0 for some h having xn+1 as free variable, which is impossible since xn+1 ∈/ FV(M<sup>m</sup>+1).

Thanks to the strong adequacy property we can prove that for <sup>M</sup> <sup>∈</sup> <sup>Λ</sup><sup>+</sup> <sup>Γ</sup> each coefficient of -M<sup>Γ</sup> is entirely defined by *PT*<sup>η</sup> (M) for  large enough. To do so we define the following size on |D|, Mf(|D|) and Mf(|D|) <sup>Γ</sup> × |D|:

**Fig. 2.** Examples of Nakajima trees. Distributions are represented by barycentric sums, depicted as + nodes whose outgoing edges are weighted by probabilities.

– #() = 0 for the base element,


**Lemma 11.** *Given* <sup>∈</sup> <sup>N</sup> *and* M,N <sup>∈</sup> <sup>Λ</sup><sup>+</sup> <sup>Γ</sup> *, if PT*<sup>η</sup> (M) = *PT*<sup>η</sup> (N) *then for any* (*m*, d) ∈ M*f*(|D|) <sup>Γ</sup> × |D| *with* #(*m*, d) <sup>&</sup>lt;*, we have* -M<sup>Γ</sup> *<sup>m</sup>*,d = -N<sup>Γ</sup> *<sup>m</sup>*,d*.*

*Proof.* We do induction on . If  ≤ 1, then #(*m*, d) = 0 implies d = and for every x ∈ Γ, *m*<sup>x</sup> = [ ]. In this case we remark that both -M<sup>Γ</sup> *<sup>m</sup>*,d, -N<sup>Γ</sup> *m*,d are null. This in fact can be easily checked by inspecting the rules of Fig. 1, computing the matrix denoting a term by structural induction over the term.

Otherwise, by Theorem 1, we have: -M<sup>Γ</sup> *<sup>m</sup>*,d = <sup>h</sup>∈HNF<sup>Γ</sup> Red<sup>∞</sup> M,hh<sup>Γ</sup> *<sup>m</sup>*,d. This last sum can be refactored as <sup>t</sup>∈*VT*<sup>η</sup> - <sup>h</sup>∈(*VT*<sup>η</sup> - )−1(t) Red<sup>∞</sup> M,hh<sup>Γ</sup> *<sup>m</sup>*,d. A similar reasoning for N gives -N<sup>Γ</sup> *<sup>m</sup>*,d = <sup>t</sup>∈*VT*<sup>η</sup> - <sup>h</sup>∈(*VT*<sup>η</sup> - )−1(t) Red<sup>∞</sup> N,hh<sup>Γ</sup> *<sup>m</sup>*,d.

 Let us fix a <sup>t</sup> <sup>∈</sup> *VT*<sup>η</sup> and (*m*, d) ∈ Mf(|D|) <sup>Γ</sup> × |D| with #(*m*, d) <sup>&</sup>lt;. Let us prove that:

 for any h, h <sup>∈</sup> (*VT*<sup>η</sup> )−<sup>1</sup>(t), we have h<sup>Γ</sup> *<sup>m</sup>*,d = h Γ *<sup>m</sup>*,d.

Notice that implies -M<sup>Γ</sup> *<sup>m</sup>*,d = -N<sup>Γ</sup> *<sup>m</sup>*,d, since the hypothesis *PT*<sup>η</sup> (M) = *PT*<sup>η</sup> (N) gives <sup>h</sup>∈(*VT*<sup>η</sup> - )−1(t) Red<sup>∞</sup> M,h = <sup>h</sup>∈(*VT*<sup>η</sup> - )−1(t) Red<sup>∞</sup> N,h, for any <sup>t</sup> <sup>∈</sup> *VT*<sup>η</sup> .

Let then h = λx<sup>1</sup> ...xn.yM<sup>1</sup> ...M<sup>k</sup> and h = λx<sup>1</sup> ...x<sup>n</sup> .y M <sup>1</sup> ...M <sup>k</sup> . Since <sup>≥</sup> 2, *VT*<sup>η</sup> (h) = *VT*<sup>η</sup> (h ) implies by Lemma 10 that y = y and n − k = n − k . Since D is extensional (see Sect. 3), by η-expanding one of the two terms, we can suppose n = n and, then, k = k . Besides if n > 0 let us write d = m :: d , we have h<sup>Γ</sup> *<sup>m</sup>*,d = λx<sup>2</sup> ...xn.yM<sup>1</sup> ...Mk x1,Γ (m,*m*),d with #((m,*m*), d ) = #(*m*, d), and similarly for h Γ *<sup>m</sup>*,d. So, we can reduce to consider the case: h = yM<sup>1</sup> ...M<sup>k</sup> and h = yM <sup>1</sup> ...M <sup>k</sup>. If k = 0 the claim is trivial, otherwise by unfolding the applications of h using the applicative case in Fig. 1, we have that:

$$\begin{aligned} \lbrack h \rbrack\_{m,d}^{\Gamma} &= \sum\_{\begin{subarray}{c}(m\_{0},\ldots,m\_{k})\\ \text{s.t. } m=\bigsqcup\_{i}m\_{i}\end{subarray}} \sum\_{\begin{subarray}{c}m\_{1},\ldots,m\_{k}\\ \text{s.t. } m=\bigsqcup\_{i}m\_{i}\end{subarray}} \llbracket y\rbrack\_{m\_{0},m\_{1};\ldots:m\_{k}:d}^{\Gamma} (\llbracket M\_{1} \rrbracket^{\Gamma})^{m\_{1},m\_{1}}\ldots\big(\llbracket M\_{k} \rrbracket^{\Gamma}\rangle^{m\_{k},m\_{k}}\\ \text{s.t. } m=\bigsqcup\_{i}m\_{i}\in\mathcal{M}\_{i}(\mathcal{D}\rrbracket)\end{aligned}$$

and the same for h , replacing each M<sup>i</sup> with M <sup>i</sup> . Notice that y<sup>Γ</sup> *<sup>m</sup>*0,m1::···::mk::<sup>d</sup> <sup>=</sup> 0 implies (*m*0)<sup>y</sup> = [m<sup>1</sup> :: ··· :: m<sup>k</sup> :: d], hence #(mi) < #(*m*0) for any i ≤ k, thus #(*m*i, mi) < #(*m*i) + #(*m*0) ≤ #(*m*) ≤ #(*m*, d) <  and #(*m*i, mi) <  − 1. Moreover, the hypothesis *VT*<sup>η</sup> (h) = *VT*<sup>η</sup> (h ), implies *PT*<sup>η</sup> −<sup>1</sup>(Mi) = *PT*<sup>η</sup> −<sup>1</sup>(M i ) for any i ≤ k, so we conclude by induction hypothesis on each term in the sums appearing in (-Mi<sup>Γ</sup> )*<sup>m</sup>*i,m<sup>i</sup> and (-M <sup>i</sup> <sup>Γ</sup> )*<sup>m</sup>*i,m<sup>i</sup> .

**Corollary 1.** *Let* M,N <sup>∈</sup>Λ<sup>+</sup> <sup>Γ</sup> *,* <sup>∀</sup> <sup>∈</sup>N,*PT*<sup>η</sup> (M)=*PT*<sup>η</sup> (N) *implies* -M<sup>Γ</sup> =-N<sup>Γ</sup> *.* **Theorem 2.** *For any two terms* M,N <sup>∈</sup> <sup>λ</sup><sup>+</sup> <sup>Γ</sup> *, the following are equivalent:*


*Proof.* (1) to (2) is given by [12, Theorem 10.1]. From (2) and Corollary 1, we get (3). Finally, (3) implies (1) by the adequacy of probabilistic coherence spaces, proven in [6, Corollary 25].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Sound and Complete Logic for Algebraic Effects**

Cristina Matache(B) and Sam Staton

University of Oxford, Oxford, UK cristina.matache@balliol.ox.ac.uk

**Abstract.** This work investigates three notions of program equivalence for a higher-order functional language with recursion and general algebraic effects, in which programs are written in continuation-passing style. Our main contribution is the following: we define a logic whose formulas express program properties and show that, under certain conditions which we identify, the induced program equivalence coincides with a contextual equivalence. Moreover, we show that this logical equivalence also coincides with an applicative bisimilarity. We exemplify our general results with the nondeterminism, probabilistic choice, global store and I/O effects.

### **1 Introduction**

Logic is a fundamental tool for specifying the behaviour of programs. A general approach is to consider that a logical formula φ encodes a program property, and the satisfaction relation of the logic, <sup>t</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>, asserts that program <sup>t</sup> enjoys property φ. An example is Hennessy-Milner logic [12] used to model concurrency and nondeterminism. Other program logics include Hoare logic [13], which describes imperative programs with state, and more recently separation logic [28]. Both state and nondeterminism are examples of *computational effects* [25], which represent impure behaviour in a functional programming language. The logics mentioned so far concern languages with first-order functions, so as a natural extension, we are interested in finding a logic which describes higher-order programs with general effects.

The particular flavour of effects we consider is that of *algebraic effects* developed by Plotkin and Power [32–34]. This is a unified framework in which effectful computation is triggered by a set of operations whose behaviour is axiomatized by a set of equations. For example, nondeterminism is given by a binary choice operation or(−, <sup>−</sup>) that satisfies the equations of a semilattice. Thus, general effectful programs have multiple possible execution paths, which can be visualized as an (effect) tree, with effect operations labelling the nodes. Consider the following function or suc which has three possible return values, and the effect tree of (or suc 2):

$$\begin{aligned} \mathtt{or}.\mathtt{succ} &= \lambda x.\mathtt{nat}. \qquad &(\mathtt{or}.\mathtt{succ}\ 2) &\longmapsto \begin{array}{c} (\mathtt{or}.\mathtt{succ}\ 2) &\longmapsto \begin{array}{c} or \\ 3 \end{array} \sim 4 \end{aligned} $$

Apart from state and nondeterminism, examples of algebraic effects include probabilistic choice and input and output operations.

Apart from providing a specification language for programs, a logic can also be used to compare two different programs. This leads to a notion of program equivalence: two programs are equivalent when they satisfy exactly the same formulas in the logic.

Many other definitions of program equivalence for higher-order languages exist. An early notion is contextual equivalence [26], which asserts that two programs are equivalent if they have the same observable behaviour in all program contexts. However, this is hard to establish in practice due to the quantification over all contexts. Another approach, which relies on the existence of a suitable denotational model of the language, is checking equality of denotations. Yet another notion, meant to address the shortcomings of the previous two, is that of applicative bisimilarity [1].

Given the wide range of definitions of program equivalence, comparing them is an interesting question. For example, the equivalence induced by Hennessy-Milner logic is known to coincide with bisimilarity for CCS. Thus, we not only aim to find a logic describing general algebraic effects, but also to compare it to existing notions of program equivalence.

Program equivalence for general algebraic effects has been studied by Johann, Simpson and Voigtl¨ander [17] who define a notion of contextual equivalence and a corresponding logical relation. Dal Lago, Gavazzo and Levy [7] provide an abstract treatment of applicative bisimilarity in the presence of algebraic effects. Working in a typed, call-by-value setting, Simpson and Voorneveld [38] propose a modal logic for effectful programs whose induced program equivalence coincides with applicative bisimilarity, but not with contextual equivalence (see counterexample in Sect. 5). Dal Lago, Gavazzo and Tanaka [8] propose a notion of applicative similarity that coincides with contextual equivalence for an untyped, call-by-name effectful calculus.

These papers provide the main starting point for our work. Our goal is to find a logic of program properties which characterizes contextual equivalence for a higher-order language with algebraic effects. We study a typed call-by-value language in which programs are written in continuation-passing style (CPS). CPS is known to simplify contextual equivalence, through the addition of control operators (e.g. [5]), but it also implies that all notions of program equivalence we define can only use continuations to test return values. Contextual equivalence and bisimilarity for lambda-calculi with control, but without general effects, have been studied extensively (e.g. [4,15,23,41]).

In CPS, functions receive as argument the continuation (which is itself a function) to which they pass their return value. Consider the function that adds two natural numbers. This usually has type nat → nat → nat, but its CPS version is defined as: addk <sup>=</sup> <sup>λ</sup>(n:nat, m:nat, k:nat→R). k (<sup>n</sup> <sup>+</sup> <sup>m</sup>) for some fixed return type R. The function or suc becomes in CPS:

or succ <sup>=</sup> <sup>λ</sup>(x:nat, k:nat→R). or(k x, or(addk (x, <sup>1</sup>, k), addk (x, <sup>2</sup>, k))).

A general translation of direct-style functions into CPS can be found in Sect. 5.

We fix a calculus named ECPS (Sect. 2), in which programs are not expected to return, except through a call to the continuation. Contextual equivalence is defined using a custom set of observations P, where the elements of P are sets of effect trees. We consider a logic F whose formulas express properties of ECPS programs (Sect. 3). For example, or succ satisfies the following formula: <sup>φ</sup> = ({2}, ({3}∨{4}) → -) → ♦.

Here, ♦ is the set of all effect trees for which at least one execution path succeeds and is the set of trees that always succeed. So or succ <sup>|</sup>=<sup>F</sup> <sup>φ</sup> says that, when given arguments 2 and a continuation that always succeeds for input 3 or 4, then or succ *may* succeed. In other words, or succ may 'return' 3 or 4 to the continuation. In contrast, or succ <sup>|</sup>=<sup>F</sup> <sup>φ</sup> = ({2}, ({3}∨{4}) → -) → - says that the program or succ *must* return 3 or 4 to the continuation. Thus or succ |=<sup>F</sup> <sup>φ</sup> because the continuation <sup>k</sup> might diverge on 2.

Another example can be obtained by generalizing the or succ function to take a function as a parameter, rather than using addk:

$$\mathsf{or}\mathsf{r}\mathsf{s}\mathsf{succ}\prime = \lambda (x:\mathsf{nat},\ k:\mathsf{nat}\to\mathsf{R},\ f:(\mathsf{nat},\ \mathsf{nat},\ \mathsf{nat}\to\mathsf{R})\to\mathsf{R}).$$

$$or (k\ x,\ or\ (f\ (x,1,k),\ f\ (x,2,k)))$$

$$\vdash\_{\mathcal{F}}\left(\{2\},\ \{4\}\mapsto\diamondsuit,\ \{(\{2\},\ \{2\},\ \{4\}\mapsto\diamondsuit)\mapsto\diamondsuit\}\right)\mapsto\diamondsuit.$$

The formula above says that or succ' may call f with arguments 2, 2 and k.

The main theorem concerning the logic F (Theorem 1) is that, under certain restrictions on the observations in P, logical equivalence coincides with contextual equivalence. In other words, F is sound and complete with respect to contextual equivalence. The proof of this theorem, outlined in Sect. 4, involves applicative bisimilarity as an intermediate step. Thus, we show in fact that three notions of program equivalence for ECPS are the same: logical equivalence, contextual equivalence and applicative bisimilarity. Due to space constraints, proofs are omitted but they can be found in [21].

#### **2 Programming Language – ECPS**

We consider a simply-typed functional programming language with general recursion, a datatype of natural numbers and general algebraic effects as introduced by Plotkin and Power [32]. We will refer to this language as ECPS because programs are written in continuation-passing style.

ECPS distinguishes between terms which can reduce further, named computations, and values, which cannot reduce. ECPS is a variant of both Plotkin's PCF [31] and Levy's Jump-With-Argument language [20], extended with algebraic effects. A fragment of ECPS is discussed in [18] in connection with logic.

Types A, A1, B := (A1,...,An)→<sup>R</sup> <sup>|</sup> nat (<sup>n</sup> <sup>≥</sup> 0) Typing contexts <sup>Γ</sup> := ∅ | Γ, x : A.

The only base type in ECPS is nat. The return type of functions, R, is fixed and is *not* a first-class type. Intuitively, we consider that functions are not expected to return. A type in direct style <sup>A</sup> <sup>→</sup> <sup>B</sup> becomes in ECPS: (A, B→R)→R. In the typing context (Γ, x : <sup>A</sup>), the free variable <sup>x</sup> does not appear in Γ.

First, consider the pure fragment of ECPS, without effects, named CPS:

$$\begin{array}{ll}\text{Values} & v, w := \texttt{zero} \mid \texttt{succ}(v) \mid \lambda(x\_1; A\_1, \dots, x\_n; A\_n). t \mid x & (n \ge 0) \\\text{Computations} & s, t := v(w\_1, \dots, w\_n) \mid \texttt{case} \; v \; \texttt{of} \; \{\texttt{zero} \Rightarrow s, \texttt{succ}(x) \Rightarrow t\} \mid \\ & (\texttt{rec } x. v)(w\_1, \dots, w\_n). \end{array}$$

Variables, natural numbers and lambdas are values. Computations include function application and an eliminator for natural numbers. The expression rec x.v is a recursive definition of the function v, which must be applied. If exactly one argument appears in a lambda abstraction or an application term, we will sometimes omit the parentheses around that argument.

There are two typing relations in CPS, one for values <sup>Γ</sup> <sup>v</sup> : <sup>A</sup>, which says that value <sup>v</sup> has type <sup>A</sup> in the context <sup>Γ</sup>, and one for computations <sup>Γ</sup> <sup>t</sup> : <sup>R</sup>. This says that t is well-formed given the context Γ. All computations have the same return type R. We also define the *order of a type* recursively, which roughly speaking counts the number of function arrows → in a type.

$$\begin{array}{llll}\hline I\color[rgb]{12.0pt}{ $I$ } & \color[rgb]{12.0pt}{ $A$ ^{\raisebox[rgb]{12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[rgb-12.0pt}{ $A$ ^{\raisebox[box{.eqbox-12.0pt}{ $A$ ^{\raisebox[box{.eqbox-12.0}{ $A}}{$ A} $^{\raisebox{-12.0pt}{$ A $^{\raisebox[rgb-12.0pt}{$ A $^{\raisebox[box{.eqbox-12.0}{$ A}}{ $A$ ^{\raisebox[box{.eqbox-12.0}{ $A}}}}}}}\dots\color{I}{$ \Gamma\vdash v:\text{nat}}\\\hline\hline\Gamma\vdash v:\{\overline{A}\}\rightarrow\text{R} & \Gamma\vdash v:\{\overline{A}\}\rightarrow\text{R}\quad(\Gamma\vdash\mathsf{w}:A\_{i})\_{i}\} \\\hline\Gamma\vdash v:\{\overline{\textbf{w}}\}:\text{R} & \Gamma\vdash t:\text{R} & \Gamma,\text{ $\color[rgb-12.0pt}{$ A $^{\raisebox{-12.0}{$ A $^{\raisebox[{\box{.}[0]{A}$ ^{\raisebox[{]}{A}}}}}}}\dots\text{R}\\\hline\end{array}}\qquad\dfrac{\begin{array}{ll}\Gamma\vdash v:\text{nat}}{\Gamma\vdash\mathsf{s}}\text{\color[rgb-12.0pt}{ $A$ ^{\raisebox{$$

To introduce algebraic effects into our language, we consider a new kind of context Σ, disjoint from Γ, which we call an *effect context*. The symbols σ appearing in Σ stand for effect operations and their type must have either order 1 or 2. For example, the binary choice operation or : (()→R, ()→R)→<sup>R</sup> expects two thunked computations. The output operation output : (nat, ()→R)→<sup>R</sup> expects a parameter and a continuation. An operation signifying success, which takes no arguments, is <sup>↓</sup> : ()→R. Roughly, <sup>Σ</sup> could be regarded as a countable algebraic signature.

We extend the syntax of CPS with effectful computations. The typing relations now carry a <sup>Σ</sup> context: <sup>Γ</sup> <sup>Σ</sup> <sup>v</sup> : <sup>A</sup> and <sup>Γ</sup> <sup>Σ</sup> <sup>t</sup> : <sup>R</sup>. Otherwise, the typing judgements remain unchanged; we have a new rule for typing effect operations:

$$s, t \coloneqq \dots \mid \sigma(\overrightarrow{v}, \overrightarrow{k}) \qquad \frac{\sigma : (\overrightarrow{A}, \overrightarrow{B}) \to \mathbb{R} \in \Sigma \quad (\varGamma \vdash\_{\Sigma} v\_i : A\_i)\_i \quad (\varGamma \vdash\_{\Sigma} k\_j : B\_j)\_j}{\varGamma \vdash\_{\Sigma} \sigma(\overrightarrow{v}, \overrightarrow{k}) : \mathbb{R}}$$

In ECPS, the only type with order 0 is nat, so in fact A<sup>i</sup> = nat for all i. Notice that the grammar does not allow function abstraction over a symbol from Σ and that σ is not a first-class term. So we can assume that Σ is fixed, as in the examples from Sect. 2.1.

As usual, we identify terms up to alpha-equivalence. Substitution of values for free variables that are not operations, *v*[*w/x* ] and *t*[*w/x* ], is defined in the standard way by induction on the structure of v and t. We use n to denote the term succ<sup>n</sup>(zero). Let ( <sup>Σ</sup>) be the set of well-formed closed computations and ( <sup>Σ</sup> <sup>A</sup>) the set of closed values of type <sup>A</sup>.

#### **2.1 Operational Semantics**

We define a family of relations on closed computation terms (−→) ⊆ ( <sup>Σ</sup>)×( <sup>Σ</sup>) for any effect context Σ:

$$\begin{array}{c} (\lambda(\overrightarrow{x:A}).t) \left(\overrightarrow{w}\right) \longrightarrow t[\overrightarrow{w}/\overrightarrow{x}]\\ (\texttt{rec}\ \color{red}{x.v})\left(\overrightarrow{w}\right) \longrightarrow (v[(\lambda(\overrightarrow{y:A}).(\texttt{rec}\ \color{red}{x.v})(\overrightarrow{y}))/x])\left(\overrightarrow{w}\right)\\ \texttt{case\ zero\ of\ }\{\texttt{zero}\Rightarrow s,\ \texttt{succ}(x)\Rightarrow t\} \longrightarrow s\\ \texttt{case\ succc}(v)\text{ of\ }\{\texttt{zero}\Rightarrow s,\ \texttt{succ}(x)\Rightarrow t\} \longrightarrow t[v/x]. \end{array}$$

Observe that the reduction given by −→ can either run forever or terminate with an effect operation. If the effect operation does not take any arguments of order 1 (i.e. continuations), the computation stops. If the reduction reaches σ( −→v , −→<sup>k</sup> ), the intuition is that any continuation <sup>k</sup><sup>i</sup> may be chosen, and executed with the results of operation σ. Thus, repeatedly evaluating effect operations leads to the construction of an infinitely branching tree (similar to that in [32]), as we now explain, which we call an *effect tree*. A path in the tree represents a possible execution path of the program.

An effect tree, of possibly infinite depth and width, can contain:


Denote the set of all effect trees by *Trees*Σ. This set has a partial order: tr<sup>1</sup> <sup>≤</sup> tr<sup>2</sup> if and only if tr<sup>1</sup> can be obtained by replacing subtrees of tr<sup>2</sup> by <sup>⊥</sup>. Every ascending chain <sup>t</sup><sup>1</sup> <sup>≤</sup> <sup>t</sup><sup>2</sup> <sup>≤</sup> ... has a least upper bound <sup>n</sup> <sup>t</sup>n. In fact *Trees*<sup>Σ</sup> is the free pointed Σ-algebra [2] and therefore also has a coinductive property [9].

Next, we define a sequence of effect trees associated with each well-formed closed computation. Each element in the sequence can be seen as evaluating the computation one step further. Let -<sup>−</sup>(−) : ( <sup>Σ</sup>) <sup>×</sup> <sup>N</sup> −→ *Trees*Σ:

$$\begin{aligned} \left[\mathbb{I}\right]\_0 &= \bot \\ \left[\mathbb{I}\right]\_{m+1} &= \begin{cases} \left[\mathbb{I}s\right]\_m & \text{if } t \longrightarrow s\\ \sigma\_{\overrightarrow{v}}(\left(\left(\mathbb{I}k\_i\left(\overline{n\_1}, \dots, \overline{n\_{p\_i}}\right)\mathbb{I}\_m\right)\_{n\_1, \dots, n\_{p\_i} \in \mathbb{N}}\right)\_i) & \text{if } t = \sigma(\overrightarrow{v}', \overrightarrow{k}') \end{cases} \end{aligned}$$

These are all the cases since well-formed computations do not get stuck. We define the function -− : ( <sup>Σ</sup>) −→ *Trees*<sup>Σ</sup> as the least upper bound of the chain {<sup>t</sup>n}n∈N: t = <sup>n</sup>∈Ntn.

We now give examples of effect contexts Σ for different algebraic effects, and of some computations and their associated effect trees.

*Example 1 (Pure functional computation).* <sup>Σ</sup> <sup>=</sup> {↓ : ()→R}. Intuitively, <sup>↓</sup> is a top-level success flag, analogous to a 'barb' in process algebra. This is to ensure a reasonable contextual equivalence for CPS programs, which never actually return results. For example, loop = (rec f.λ().(f x)) () runs forever, and

$$\texttt{test}.\texttt{zero} = \lambda(y\texttt{nat}).\texttt{ case}\ y\texttt{of}\ \{\texttt{zero} \Rightarrow \downarrow(),\ \texttt{succ}(x) \Rightarrow loop\}.$$

is a continuation that succeeds just when it is passed zero. Generally, an effect tree for a pure computation is either ↓ if it succeeds or ⊥ otherwise.

*Example 2 (Nondeterminism).* <sup>Σ</sup> <sup>=</sup> {or : (()→R, ()→R)→R, <sup>↓</sup> : ()→R}. Intuitively, or(k1, k2) performs a nondeterministic choice between computations k<sup>1</sup> () and <sup>k</sup><sup>2</sup> (). Consider a continuation test <sup>3</sup> : nat→<sup>R</sup> that diverges on 3 and succeeds otherwise. The program or succ from the introduction is in ECPS:

$$\begin{array}{ll} \texttt{or\\_succc} = \lambda (x \texttt{:nat}, k \texttt{:nat} \rightarrow \texttt{R}). \,\, or (\lambda (). \, k \,\, x, & \begin{bmatrix} \texttt{or\\_succc} \left( \overleftarrow{2}, \texttt{\\_test.3} \right) \end{bmatrix} = \lambda \\\ \lambda (). \,\, or (\lambda (). k \,\, (\texttt{succc}(x)), & \begin{array}{l} \upsilon \\ \downarrow \\ \bot \end{array} \end{array} \qquad \begin{array}{l} \left[ \texttt{or\\_succc} \left( \overleftarrow{2}, \texttt{\\_test.3} \right) \right] = \lambda \\\ \downarrow \\\ \bot \end{array} \\\ \begin{array}{ll} \alpha \\ \downarrow \\ \end{array} \end{array}$$

*Example 3 (Probabilistic choice).* <sup>Σ</sup> <sup>=</sup> {p-or : (()→R, ()→R)→R, <sup>↓</sup> : ()→R}. Intuitively, the operation p-or(k1, k2) chooses between k<sup>1</sup> () and k<sup>2</sup> () with probability 0.5. Consider the following term which encodes the geometric distribution:

$$\begin{aligned} \mathsf{geom} &= \lambda k \mathsf{:nat} \rightarrow \mathsf{R} . \\ \mathsf{(rec  $f$ . } \lambda (n \mathsf{:nat}, k' \mathsf{:nat} \rightarrow \mathsf{R}). p \cdot or (\lambda (). k' \ n, \ \lambda (). f \ (\mathsf{succ} (n), k')) ) \ (\mathsf{T}, k). \end{aligned}$$

The probability that geom passes a number n > 0 to its continuation is 2−<sup>n</sup>. To test it, consider <sup>k</sup> = (λx:nat. <sup>↓</sup> ()); then geom k is an infinite tree:

$$\begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \end{array} \end{array} \end{array} \end{array} \Big| \begin{array}{c} p\text{-}or\\ \begin{array}{c} \begin{array}{c} \text{-}or\\ \end{array} \end{array} \end{array} \right. \\ \begin{array}{c} \begin{array}{c} p\text{-}or\\ \end{array} \end{array} \Big| \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \end{array} \end{array} \end{array} \Big| \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \end{array} \end{array} \Big| \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \end{array} \end{array} \Big| \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \end{array} \end{array} \right. \\ \begin{array}{c} \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \end{array} \end{array} \Big| \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \end{array} \end{array} \Big| \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \end{array} \end{array} \Big| \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \end{array} \end{array} \Big| \begin{array}{c} \begin{array}{c} \begin{array}{c} \\ \end{array} \end{array} \end{array} \right. \end{array} \right. \end{array}$$

*Example 4 (Global store).* L is a finite set of locations storing natural numbers and <sup>Σ</sup> <sup>=</sup> {lookup<sup>l</sup> : (nat→R)→R, update<sup>l</sup> : (nat,()→R)→<sup>R</sup> <sup>|</sup> <sup>l</sup> <sup>∈</sup> <sup>L</sup>}∪{↓ : ()→R}. Intuitively, lookupl(k) looks up the value at storage location l, if this is n it continues with k (n). For updatel(v, k) the intuition is: write the number v in location l then continue with the computation k (). For example:

update*<sup>l</sup>*<sup>0</sup> (1, λ().lookup*<sup>l</sup>*<sup>0</sup> (λx:nat.case x of zero (), succ(y) loop )) = update*<sup>l</sup>*0*,*<sup>1</sup> lookup*<sup>l</sup>*<sup>0</sup> ...

Only the second branch of lookupl<sup>0</sup> can occur. The other branches are still present in the tree because -− treats effect operations as uninterpreted syntax.

*Example 5 (Interactive input/output).* <sup>Σ</sup> <sup>=</sup> {↓ : ()→R, output : (nat,()→R)→R, input : (nat→R)→R}. Intuitively, the computation input(k) accepts number <sup>n</sup> from the input channel and continues with k (n). The computation output(v, k) writes v to the output channel then continues with computation k (). Below is a computation that inputs a number n then outputs it immediately, and repeats.

echo = rec f. λ(). input(λx:nat. output(x, λ().f ())) () = input output<sup>0</sup> echo output<sup>1</sup> echo output<sup>2</sup> echo ...

#### **2.2 Contextual Equivalence**

Informally, two terms are contextually equivalent if they have the same *observable behaviour* in all program contexts. The definition of observable behaviour depends on the programming language under consideration. In ECPS, we can observe effectful behaviour such as interactive output values or the probability with which a computation succeeds. This behaviour is encoded by the effect tree of a computation. Therefore, we represent an ECPS observation as a set of effect trees P. A computation t exhibits observation P if <sup>t</sup> <sup>∈</sup> <sup>P</sup>.

For a fixed set of effect operations Σ, we define the set P of possible *observations*. The elements of P are subsets of *Trees*Σ. Observations play a similar role to the modalities from [38]. For our running examples of effects, P is defined as follows:

*Example 6 (Pure functional computation).* Define P = {⇓} where ⇓ = {↓}. There are no effect operations so the ⇓ observation only checks for success.

*Example 7 (Nondeterminism).* Define <sup>P</sup> <sup>=</sup> {♦, -} where:

♦ <sup>=</sup> {tr <sup>∈</sup> *Trees*<sup>Σ</sup> <sup>|</sup> at least one of the paths in tr has a <sup>↓</sup> leaf}


The intuition is that, if <sup>t</sup> <sup>∈</sup> ♦, then computation <sup>t</sup> *may* succeed, whereas if <sup>t</sup> <sup>∈</sup> -, then t *must* succeed.

*Example 8 (Probabilistic choice).* Define <sup>P</sup> : *Trees*<sup>Σ</sup> −→ [0, 1] to be the least function, by the pointwise order, such that:

$$\mathbb{P}(\downarrow) = 1 \qquad \mathbb{P}(p \cdot or (tr\_0, tr\_1)) = \frac{1}{2} \mathbb{P}(tr\_0) + \frac{1}{2} \mathbb{P}(tr\_1).$$

Notice that <sup>P</sup>(⊥) = 0. Then observations are defined as:

$$\mathbf{P}\_{\geq q} = \{ tr \in Trees\_{\Sigma} \mid \mathbb{P}(tr) > q \} \qquad \mathfrak{P} = \{ \mathbf{P}\_{\geq q} \mid q \in \mathbb{Q}, \ 0 \leq q < 1 \}.$$

This means that <sup>t</sup> <sup>∈</sup> **<sup>P</sup>**>q if the probability that <sup>t</sup> succeeds is greater than <sup>q</sup>.

*Example 9 (Global store).* Define the set of states as the set of functions from storage locations to natural numbers: State <sup>=</sup> <sup>L</sup> −→ <sup>N</sup>. Given a state <sup>S</sup>, we write [S↓] <sup>⊆</sup> *Trees*<sup>Σ</sup> for the set of effect trees that terminate when starting in state <sup>S</sup>. More precisely, [−] is the least State-indexed family of sets satisfying the following:

$$\frac{-}{l\;\!\!\!\!\/+ [S\downarrow]} \qquad \frac{l\in\mathbb{L} \qquad tr\_{S(l)}\in [S\downarrow]}{l\\\\log{t}\_{l}(tr\_{0}, tr\_{1}, tr\_{2}, \ldots)\in [S\downarrow]} \qquad \frac{l\in\mathbb{L} \qquad tr\in [S[l:=n]\downarrow]}{update\_{l,\overline{\pi}}(tr)\in [S\downarrow]}$$

The set of observations is: <sup>P</sup> <sup>=</sup> {[S↓] <sup>|</sup> <sup>S</sup> <sup>∈</sup> State}.

*Example 10 (Interactive input/output).* An I/O-trace is a finite word w over the alphabet {?<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>}∪{!<sup>n</sup> <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>N</sup>}. For example, ?1 !1 ?2 !2 ?3 !3. The set of observations is: <sup>P</sup> <sup>=</sup> {W..., W↓ | <sup>W</sup> an I/O-trace}. Observations are defined as the least sets satisfying the following rules:

$$\frac{-}{tr \in \langle \epsilon \rangle\_{\dots}} \frac{tr = \downarrow}{tr \in \langle \epsilon \rangle \downarrow} \frac{tr\_n \in \langle W \rangle\_{\dots}}{input(tr\_0, tr\_1, \dots) \in \langle (?n)W \rangle\_{\dots}} \frac{tr' \in \langle W \rangle\_{\dots}}{output\_{\overline{\pi}}(tr') \in \langle (!n)W \rangle\_{\dots}}$$

and the analogous rules for (?n)W↓ and (!n)W↓. Thus, <sup>t</sup> ∈ W... if computation t produces I/O trace W, and <sup>t</sup> ∈ W↓ if additionally <sup>t</sup> succeeds immediately after producing W.

Using the set of observations P, we can now define contextual equivalence as the greatest compatible and adequate equivalence relation between possibly open terms of the same type. Adequacy specifies a necessary condition for two *closed* computations to be related, namely producing the same observations.

**Definition 1.** *A well-typed relation* <sup>R</sup> = (R<sup>v</sup> <sup>A</sup>, <sup>R</sup><sup>c</sup>) *(i.e. a family of relations indexed by ECPS types where* <sup>R</sup><sup>c</sup> *relates computations) on possibly open terms is adequate if:*

$$
\forall s, t. \ \vdash\_{\Sigma} s \; \mathcal{R}^{\mathfrak{c}} \; t \implies \forall P \in \mathfrak{P}. \ \llbracket s \rceil \in P \Longleftrightarrow \lbrack t \rceil \in P.
$$

*Relation* R *is compatible if it is closed under the rules in [21, Page 57]. As an example, the rules for application and lambda abstraction are:*

$$\frac{\begin{array}{c} \Gamma \vdash\_{\Sigma} v \; \mathcal{R}^{\mathtt{v}}\_{(\overline{A}) \rightarrow \mathtt{R}} \; v' \end{array} \quad (\Gamma \vdash\_{\Sigma} w\_{i} \; \mathcal{R}^{\mathtt{v}}\_{A\_{i}} w'\_{i})\_{i}{}}{\begin{array}{c} \Gamma \vdash\_{\Sigma} v (\overline{w}') \; \mathcal{R}^{\mathtt{v}} \; v' (\overline{w'}') \end{array} \quad \begin{array}{c} \Gamma, \overline{x : \overline{A}} \vdash\_{\Sigma} s \; \mathcal{R}^{\mathtt{c}} \; t \; \mathcal{R}^{\mathtt{c}} \; t \; \mathcal{R}^{\mathtt{c}} \\ \Gamma \vdash\_{\Sigma} \lambda (\overline{x : \overline{A}}) . s \; \mathcal{R}^{\mathtt{v}}\_{(\overline{A}) \rightarrow \mathtt{R}} \; \lambda (\overline{x : \overline{A}}) . t \; \mathcal{R}^{\mathtt{v}}\_{(\overline{A}) \rightarrow \mathtt{R}} \end{array} \end{cases}$$

**Definition 2 (Contextual equivalence).** *Let* CA *be the set of well-typed relations on possibly open terms that are both compatible and adequate. Define contextual equivalence* <sup>≡</sup>ctx *to be* CA*.*

**Proposition 1.** *Contextual equivalence* ≡ctx *is an equivalence relation, and is moreover compatible and adequate.*

This definition of contextual equivalence, originally proposed in [11,19], can be easily proved equivalent to the traditional definition involving program contexts (see [21, §7]). As Pitts observes [30], reasoning about program contexts directly is inconvenient because they cannot be defined up to alpha-equivalence, hence we prefer using Definition 2.

For example, in the pure setting (Example 1), we have 0 ≡ctx 1, because test zero(0) ≡ctx test zero(1); they are distinguished by the observation ⇓. In the state example, lookup<sup>l</sup><sup>1</sup> (k) ≡ctx lookup<sup>l</sup><sup>2</sup> (k), because they are distinguished by the context (λk:nat→R. [−]) (test zero) and the observation [S↓] where S(l1) = 0 and S(l2) = 1. In the case of probabilistic choice (Example 3), geom (λx:nat. <sup>↓</sup> ()) <sup>≡</sup>ctx <sup>↓</sup> () because (geom (λx:nat. <sup>↓</sup> ())) succeeds with probability 1 ('almost surely').

# **3 A Program Logic for ECPS –** *F*

This section contains the main contribution of the paper: a logic F of program properties for ECPS which characterizes contextual equivalence. Crucially, the logic makes use of the observations in P to express properties of computations.

In F, there is a distinction between formulas that describe values and those that describe computations. Each value formula is associated an ECPS type A. Value formulas are constructed from the basic formulas (φ1,...,φn) → <sup>P</sup> and <sup>φ</sup> <sup>=</sup> {n}, where <sup>n</sup> <sup>∈</sup> <sup>N</sup> and <sup>P</sup> <sup>∈</sup> <sup>P</sup>, as below. The indexing set <sup>I</sup> can be infinite, even uncountable. Computation formulas are simply the elements of P.

$$\frac{\begin{array}{c} \{\mathsf{VAL}\}\\ n \in \mathbb{N} \end{array}}{\begin{array}{c} n \in \mathbb{N} \end{array}} \frac{\phi\_{1}: A\_{1} \ldots \phi\_{n}: A\_{n}}{(\phi\_{1}, \ldots, \phi\_{n}) \mapsto P: (A\_{1}, \ldots, A\_{n}) \to \mathbb{R}} \frac{(\phi\_{i}: A)\_{i \in I}}{\vee\_{i \in I} \phi\_{i}: A} \frac{(\phi\_{i}: A)\_{i \in I}}{\wedge\_{i \in I} \phi\_{i}: A} \ \frac{\phi: A}{\lnot \phi: A}$$

The satisfaction relation <sup>|</sup>=<sup>F</sup> relates a closed value <sup>Σ</sup> <sup>v</sup> : <sup>A</sup> to a value formula φ : A of the same type, or a closed computation t to an observation P. Relation <sup>t</sup> <sup>|</sup>=<sup>F</sup> <sup>P</sup> tests the shape of the effect tree of <sup>t</sup>.

$$\begin{aligned} v \mid\_{\mathcal{F}} \vdash\_{\mathcal{F}} \{ n \} &\iff\quad v = \overline{n} \\ v \mid\_{\mathcal{F}} \mid\_{\mathcal{F}} \left( \phi\_{1}, \ldots, \phi\_{n} \right) \longmapsto P &\iff\quad \text{for all closed values } w\_{1}, \ldots, w\_{n} \text{ such that} \\ &\qquad \forall i.\; w\_{i} \mid\_{\mathcal{F}} \phi\_{i} \text{ then } v(w\_{1}, \ldots, w\_{n}) \mid\_{\mathcal{F}} \vdash\_{\mathcal{F}} P \\\ v \mid\_{\mathcal{F}} \vdash\_{i \in I} \phi\_{i} &\iff\quad \text{there exists } j \in I \text{ such that } v \mid\_{\mathcal{F}} \vdash\_{\mathcal{F}} \phi\_{j} \\\ v \mid\_{\mathcal{F}} \vdash\_{i \in I} \phi\_{i} &\iff\quad \text{for all } j \in I, \; v \mid\_{\mathcal{F}} \vdash\_{\mathcal{F}} \phi\_{j} \\\ v \mid\_{\mathcal{F}} \vdash\_{\mathcal{F}} \neg \phi &\iff\quad \text{it is false that } v \mid\_{\mathcal{F}} \vdash\_{\mathcal{F}} \phi \\\ t \mid\_{\mathcal{F}} P &\iff\quad \!\!\vdash\_{\mathcal{F}} \left[ t \right] \in P. \end{aligned}$$

*Example 11.* Consider the following formulas, where only φ<sup>3</sup> and φ<sup>4</sup> refer to the same effect context:

$$\begin{split} \phi\_{1} &= \left( \left( \{ 3 \} \mapsto \bigcirc \right) \mapsto \bigcirc \right) \land \left( \left( \{ 4 \} \mapsto \bigcirc \right) \mapsto \bigcirc \right) \land \left( \left( \{ 3 \} \mapsto \bigcirc \right) \mapsto \bigcirc \right) \mapsto \bigcirc \\ \phi\_{2} &= \left( \left( \vee\_{n>1} \{ n \} \right) \mapsto \mathbf{P}\_{>q} \right) \mapsto \mathbf{P}\_{>q/2} \\ \phi\_{3} &= \wedge\_{S \in State} \left( \left( \{ S(l) \} \mapsto \left[ S \downarrow \right] \right) \mapsto \left[ S \downarrow \right] \right) \\ \phi\_{4} &= \wedge\_{S \inState} \wedge\_{n \in \mathbb{N}} \left( \left( \{ n \} \right) \mapsto \left[ S[l\_{0} := n, l\_{1} := n+1] \downarrow \right] \right) \mapsto \left[ S[l\_{0} := n] \downarrow \right] \\ \phi\_{5} &= \wedge\_{k \in \mathbb{N}} \vee\_{n\_{1}, \ldots, n\_{k} \in \mathbb{N}} \left( \left( \left( \rightarrow \begin{array} \left[ \begin{array}{c} \left[ \begin{array}{c} \left[ \left[ \begin{array}{c} \left[ \begin{array}{c} \left[ \left[ \begin{array}{c} \left[ \left[ \begin{array}{c} \left[ \left[ \begin{array}{c} \left[ \left[ \left[ \left[ \left[ \left[ \left[ \left[ \left$$

Given a function <sup>v</sup> : (nat→R)→R, <sup>v</sup> <sup>|</sup>=<sup>F</sup> <sup>φ</sup><sup>1</sup> means that <sup>v</sup> is guaranteed to call its argument only with 3 or 4. The function geom from Example 3 satisfies φ<sup>2</sup> because with probability 1/2 it passes to the continuation a number n > 1.

For example, the following satisfactions hold: λk:nat→R. lookupl(k) <sup>|</sup>=<sup>F</sup> <sup>φ</sup><sup>3</sup> and <sup>f</sup> <sup>=</sup> <sup>λ</sup>(x:nat, k:()→R). update<sup>l</sup><sup>1</sup> (succ(x), k) <sup>|</sup>=<sup>F</sup> <sup>φ</sup>4. The latter formula says that, either f always succeeds, or f evaluated with n changes the state from S[l<sup>0</sup> := n] to S[l<sup>0</sup> := n, l<sup>1</sup> := n + 1] before calling its continuation. This is similar to a total correctness assertion [S[l<sup>0</sup> := <sup>n</sup>]](−)[S[l<sup>0</sup> := n, l<sup>1</sup> := <sup>n</sup> + 1]] from Hoare logic, for a direct style program. Formula φ<sup>5</sup> is satisfied by λ().echo, where echo is the computation defined in Example 5.

Even though the indexing set <sup>I</sup> in <sup>∧</sup><sup>i</sup>∈<sup>I</sup> and <sup>∨</sup><sup>i</sup>∈<sup>I</sup> may be uncountable, the sets of values and computations are countable. Since logical formulas are interpreted over values and computations, all conjunctions and disjunctions are logically equivalent to countable ones.

**Definition 3 (Logical equivalence).** *For any closed values* <sup>Σ</sup> <sup>v</sup><sup>1</sup> : <sup>A</sup> *and* <sup>Σ</sup> <sup>v</sup><sup>2</sup> : <sup>A</sup>*, and for any closed computations* <sup>Σ</sup> <sup>s</sup><sup>1</sup> *and* <sup>Σ</sup> <sup>s</sup>2*:*

$$\begin{aligned} v\_1 \equiv\_{\mathcal{F}} v\_2 &\iff \quad \forall \phi: A \text{ in } \mathcal{F}. \ (v\_1 \mid\_{\mathcal{F}} \phi \Longleftrightarrow v\_2 \mid\_{\mathcal{F}} \phi),\\ s\_1 \equiv\_{\mathcal{F}} s\_2 &\iff \quad \forall P \text{ in } \mathcal{F}. \ (s\_1 \mid\_{\mathcal{F}} \ P \Longleftrightarrow s\_2 \mid\_{\mathcal{F}} P). \end{aligned}$$

To facilitate equational reasoning, logical equivalence should be compatible, a property proved in the next section (Proposition 3). Compatibility allows substitution of related programs for a free variable that appears on both sides of a program equation. Notice that logical equivalence would not be changed if we added conjunction, disjunction and negation at the level of computation formulas. We have omitted such connectives for simplicity.

To state our main theorem, first define the open extension of a well-typed relation <sup>R</sup> on closed terms as: −−−→<sup>x</sup> : <sup>A</sup> <sup>Σ</sup> <sup>t</sup> <sup>R</sup>◦ <sup>s</sup> if and only if for any closed values ( <sup>Σ</sup> <sup>v</sup><sup>i</sup> : <sup>A</sup>i)i, <sup>t</sup>[ −−→v/x] <sup>R</sup> <sup>s</sup>[ −−→v/x]. Three sufficient conditions that we impose on the set of observations P are defined below. The first one, consistency, ensures that contextual equivalence can distinguish at least two programs.

**Definition 4 (Consistency).** *A set of observations* P *is consistent if there exists at least one observation* <sup>P</sup><sup>0</sup> <sup>∈</sup> <sup>P</sup> *such that:*


**Definition 5 (Scott-openness).** *A set of trees* X *is Scott-open if:*


**Definition 6 (Decomposability).** *The set of observations* P *is decomposable if for any* <sup>P</sup> <sup>∈</sup> <sup>P</sup>*, and for any* tr <sup>∈</sup> <sup>P</sup>*:*

$$\begin{array}{c} \forall \sigma \in \Sigma. \ (tr = \sigma\_{\overrightarrow{v}}(\overrightarrow{tr'}) \implies\\ \overrightarrow{\exists P'} \in \mathfrak{P} \cup \{Trees\_{\Sigma}\}. \ \overrightarrow{tr'} \in \overrightarrow{P'} \text{ and } \forall \overrightarrow{p'} \in \overrightarrow{P'}. \ \sigma\_{\overrightarrow{v}}(\overrightarrow{p'}) \in P). \end{array}$$

**Theorem 1 (Soundness and Completeness of** F**).** *For a decomposable set of Scott-open observations* P *that is consistent, the open extension of* F*-logical equivalence coincides with contextual equivalence:* (≡◦ <sup>F</sup> )=(≡ctx)*.*

The proof of this theorem is outlined in Sect. 4. It is easy to see that for all running examples of effects the set <sup>P</sup> is consistent. The proof that each <sup>P</sup> <sup>∈</sup> <sup>P</sup> is Scott-open is similar to that for modalities from [38]. It remains to show that for all our examples P is decomposable. Intuitively, decomposability can be understood as saying that logical equivalence is a congruence for the effect context Σ.

*Example 12 (Pure functional computation).* The only observation is ⇓ = {↓}. There are no trees in ⇓ whose root has children, so decomposability is satisfied.

*Example 13 (Nondeterminism).* Consider tr <sup>∈</sup> ♦. Either tr <sup>=</sup> <sup>↓</sup>, in which case we are done, or tr = or(tr 0, tr <sup>1</sup>). It must be the case that either tr <sup>0</sup> or tr <sup>1</sup> have <sup>a</sup> <sup>↓</sup>-leaf. Without loss of generality, assume this is the case for tr <sup>0</sup>. Then we know tr <sup>0</sup> <sup>∈</sup> ♦ so we can choose <sup>P</sup> <sup>0</sup> = ♦, P <sup>1</sup> <sup>=</sup> *Trees*Σ. For any −→ <sup>p</sup> <sup>∈</sup> −→ P we know or( −→ <sup>p</sup>) <sup>∈</sup> ♦ because <sup>p</sup> <sup>0</sup> has a ↓-leaf, so decomposability holds. The argument for tr <sup>∈</sup> is analogous: P <sup>0</sup> = P <sup>1</sup> = -.

*Example 14 (Probabilistic choice).* Consider tr = p-or(tr 0, tr <sup>1</sup>) ∈ **P**>q. Choose: <sup>q</sup><sup>0</sup> <sup>=</sup> <sup>P</sup>(tr- 0) P(tr- 0)+P(tr- <sup>1</sup>) ·2<sup>q</sup> and <sup>q</sup><sup>1</sup> <sup>=</sup> <sup>P</sup>(tr- 1) P(tr- 0)+P(tr- <sup>1</sup>) ·2q. From <sup>P</sup>(tr) = <sup>1</sup> <sup>2</sup> (P(tr 0)+P(tr <sup>1</sup>)) > <sup>q</sup> we can deduce that: 1 <sup>≥</sup> <sup>P</sup>(tr <sup>0</sup>) > q<sup>0</sup> and 1 <sup>≥</sup> <sup>P</sup>(tr <sup>1</sup>) > q1. So we can choose P <sup>0</sup> = **P**>q<sup>0</sup> , P <sup>1</sup> = **P**>q<sup>1</sup> to satisfy decomposability.

*Example 15 (Global store).* Consider a tree tr = σ→−<sup>v</sup> (tr 0, tr 1, tr <sup>2</sup>,...) <sup>∈</sup> [S↓]. If σ = lookupl, then we know tr <sup>S</sup>(l) <sup>∈</sup> [S↓]. In the definition of decomposability, choose P <sup>S</sup>(l) = [S↓] and <sup>P</sup> <sup>k</sup>=S(l) <sup>=</sup> *Trees*<sup>Σ</sup> and we are done. If <sup>σ</sup>→−<sup>v</sup> <sup>=</sup> updatel,n, then tr <sup>0</sup> <sup>∈</sup> [S[<sup>l</sup> := <sup>n</sup>]↓]. Choose <sup>P</sup> <sup>0</sup> = [S[<sup>l</sup> := <sup>n</sup>]↓].

*Example 16 (Interactive input/output).* Consider an I/O trace <sup>W</sup> <sup>=</sup> and a tree tr = σ→−<sup>v</sup> (tr 0, tr 1, tr <sup>2</sup>,...) ∈ W.... If <sup>σ</sup> <sup>=</sup> input, it must be the case that W = (?k)W and tr <sup>k</sup> ∈ W .... We can choose <sup>P</sup> <sup>k</sup> <sup>=</sup> W ... and <sup>P</sup> <sup>m</sup>=<sup>k</sup> <sup>=</sup> ... and we are done. If σ→−<sup>v</sup> = outputn, then W = (!n)W and tr <sup>0</sup> ∈ W .... Choose P <sup>0</sup> <sup>=</sup> W ... and we are done. The proof for W↓ is analogous.

# **4 Soundness and Completeness of the Logic** *F*

This section outlines the proof of Theorem 1, which says that F-logical equivalence coincides with contextual equivalence. The full proof can be found in [21]. First, we define applicative bisimilarity for ECPS, similarly to the way Simpson and Voorneveld [38] define it for PCF with algebraic effects. Then, we prove in turn that F-logical equivalence coincides with applicative bisimilarity, and that applicative bisimilarity coincides with contextual equivalence. Thus, three notions of program equivalence for ECPS are in fact the same.

**Definition 7 (Applicative** <sup>P</sup>**-bisimilarity).** *A collection of relations* <sup>R</sup><sup>v</sup> <sup>A</sup> ⊆ ( <sup>Σ</sup> <sup>A</sup>)<sup>2</sup> *for each type* <sup>A</sup> *and* <sup>R</sup><sup>c</sup> <sup>⊆</sup> ( <sup>Σ</sup>)<sup>2</sup> *is an applicative* <sup>P</sup>*-*simulation *if:*

*1.* <sup>v</sup> <sup>R</sup><sup>v</sup> nat <sup>w</sup> <sup>=</sup><sup>⇒</sup> <sup>v</sup> <sup>=</sup> <sup>w</sup>*. 2.* <sup>s</sup> <sup>R</sup><sup>c</sup> <sup>t</sup> <sup>=</sup>⇒ ∀<sup>P</sup> <sup>∈</sup> <sup>P</sup>. (<sup>s</sup> <sup>∈</sup> <sup>P</sup> <sup>=</sup><sup>⇒</sup> <sup>t</sup> <sup>∈</sup> <sup>P</sup>)*. 3.* <sup>v</sup> <sup>R</sup><sup>v</sup> ( →− <sup>A</sup>)→<sup>R</sup> <sup>u</sup> <sup>=</sup>⇒ ∀( <sup>Σ</sup> <sup>w</sup><sup>i</sup> : <sup>A</sup>i)i. v( −→<sup>w</sup> ) <sup>R</sup><sup>c</sup> <sup>u</sup>( −→w )*.*

*An applicative* P*-*bisimulation *is a symmetric simulation.* Bisimilarity*, denoted by* ∼*, is the union of all bisimulations. Therefore, it is the greatest applicative* P*-bisimulation.*

Notice that applicative bisimilarity uses the set of observations P to relate computations, just as contextual and logical equivalence do. It is easy to show that bisimilarity is an equivalence relation.

**Proposition 2.** *Given a decomposable set of Scott-open observations* P*, the open extension of applicative* P*-bisimilarity,* ∼◦*, is compatible.*

*Proof (notes).* This is proved using Howe's method [14], following the structure of the corresponding proof from [38]. Scott-openness is used to show that the observations P interact well with the sequence of trees -−(−) associated with each computation. For details see [21, §5.4].

**Proposition 3.** *Given a decomposable set of Scott-open observations* P*, applicative* P*-bisimilarity* ∼ *coincides with* F*-logical equivalence* ≡<sup>F</sup> *. Hence, the open extension of* F*-logical equivalence* ≡◦ <sup>F</sup> *is compatible.*

*Proof (sketch).* We define a new logic V which is almost the same as F except that the (val) rule is replaced by:

$$\frac{\vdash\_{\Sigma} w\_1 : A\_1 \dots \vdash\_{\Sigma} w\_n : A\_n \quad P \in \mathfrak{P}}{(w\_1, \dots, w\_n) \mapsto P : (A\_1, \dots, A\_n) \rightharpoonup \mathfrak{R}} \qquad v \mid\_{\Vparrow} (\overline{w}) \mapsto P \iff v(\overline{w}) \mid\_{\Vparrow} P.$$

That is, formulas of function type are now constructed using ECPS values. It is relatively straightforward to show that V-logical equivalence coincides with applicative P-bisimilarity [21, Prop. 6.3.1]. However, we do not know of a similar direct proof for the logic F. From Proposition 2, we deduce that V-logical equivalence is compatible.

Next, we prove that the logics F and V are in fact equi-expressive, so they induce the same relation of logical equivalence on ECPS programs [21, Prop. 6.3.4]. Define a translation of formulas from <sup>F</sup> to <sup>V</sup>, (−), and one from <sup>V</sup> to <sup>F</sup>, (−). The most interesting cases are those for formulas of function type:

$$((\phi\_1, \ldots, \phi\_n) \mapsto P)^\flat = \bigwedge \left\{ (w\_1, \ldots, w\_n) \mapsto P \mid w\_1 \mid\_{\mathcal{V}} \phi\_1^\flat, \ldots, w\_n \mid\_{\mathcal{V}} \vdash\_{\mathcal{V}} \phi\_n^\flat \right\}$$

$$((w\_1, \ldots, w\_n) \mapsto P)^\sharp = (\chi\_{w\_1}, \ldots, \chi\_{w\_n}) \mapsto P$$

where <sup>χ</sup><sup>w</sup>*<sup>i</sup>* is the characteristic formula of <sup>w</sup>i, that is <sup>χ</sup><sup>w</sup>*<sup>i</sup>* <sup>=</sup> {<sup>φ</sup> <sup>|</sup> <sup>w</sup><sup>i</sup> <sup>|</sup>=<sup>F</sup> <sup>φ</sup>}. Equi-expressivity means that the satisfaction relation remains unchanged under both translations, for example <sup>v</sup> <sup>|</sup>=<sup>V</sup> <sup>φ</sup> ⇐⇒ <sup>v</sup> <sup>|</sup>=<sup>F</sup> <sup>φ</sup>. Most importantly, the proof of equi-expressivity makes use of compatibility of ≡<sup>V</sup> , which we established previously. For a full proof see [21, Prop. 6.2.3]).

Finally, to prove Theorem 1 we show that applicative P-bisimilarity coincides with contextual equivalence [21, Prop. 7.2.2]:

**Proposition 4.** *Consider a decomposable set* P *of Scott-open observations that is consistent. The open extension of applicative* P*-bisimilarity* ∼◦ *coincides with contextual equivalence* ≡ctx*.*

*Proof (sketch).* Prove (≡ctx) ⊆ (∼◦) in two stages: first we show it holds for closed terms by showing ≡ctx for them is a bisimulation; we make use of consistency of P in the case of natural numbers. Then we extend to open terms using compatibility of ≡ctx. The opposite inclusion follows immediately by compatibility and adequacy of ∼◦.

# **5 Related Work**

The work closest to ours is that by Simpson and Voorneveld [38]. In the context of a direct-style language with algebraic effects, EPCF, they propose a modal logic which characterizes applicative bisimilarity but not contextual equivalence. Consider the following example from [19] (we use simplified EPCF syntax):

$$M = \lambda(). \text{?} \text{nat} \qquad N = \text{let } y \Rightarrow \text{?} \text{nat } \text{in } \lambda(). \text{min}(\text{?} \text{nat}, \text{y}) \tag{1}$$

where ?nat is a computation, defined using or, which returns a natural number nondeterministically. Term <sup>M</sup> satisfies the formula <sup>Φ</sup> <sup>=</sup> ♦(true → ∧<sup>n</sup>∈<sup>N</sup>♦{n}) in the logic of [38], which says that M may return a function which in turn may return any natural number. However, N does not satisfy Φ because it always returns a *bounded* number generator G. The bound on G is arbitrarily high so M and N are contextually equivalent, since a context can only test a finite number of outcomes of G.

EPCF can be translated into ECPS via a continuation-passing translation that preserves the shape of computation trees. The translation maps a value <sup>Γ</sup> <sup>V</sup> : <sup>τ</sup> to a value <sup>Γ</sup><sup>∗</sup> <sup>V</sup> <sup>∗</sup> : <sup>τ</sup> <sup>∗</sup>. An EPCF computation <sup>Γ</sup> <sup>M</sup> : <sup>τ</sup> becomes an ECPS value <sup>Γ</sup><sup>∗</sup> <sup>M</sup><sup>∗</sup> : (<sup>τ</sup> <sup>∗</sup>→R)→R, which intuitively is waiting for a continuation <sup>k</sup> to pass its return result to (see [21, §4]). As an example, consider the cases for functions and application, where k stands for a continuation:

$$\begin{aligned} (\Gamma \vdash \lambda x \colon \tau.M : \tau \to \rho)^{\*} &= \Gamma^{\*} \vdash \lambda (x \colon \tau^{\*}, k \colon \rho^{\*} \to \mathsf{R}). (M^{\*} \ k) : (\tau^{\*}, (\rho^{\*} \to \mathsf{R})) \to \mathsf{R} \\ (\Gamma \vdash V \ W : \rho)^{\*} &= \Gamma^{\*} \vdash \lambda k \colon \rho^{\*} \to \mathsf{R}. V^{\*} \ (W^{\*}, k) : (\rho^{\*} \to \mathsf{R}) \to \mathsf{R}. \end{aligned}$$

This translation suggests that ECPS functions of type (A1,...,An)→<sup>R</sup> can be regarded as continuations that never return. In EPCF the CPS-style algebraic operations can be replaced by direct-style generic effects [34], e.g. *input*() : nat.

One way to understand this CPS translation is that it arises from the fact that ((−)→R)→R is a monad on the multicategory of values (in a suitable sense, e.g. [40]), which means that we can use the standard monadic interpretation of a call-by-value language. As usual, the algebraic structure on the return type R induces an algebraic structure on the entire monad (see e.g. [16], [24, §8]). We have not taken a denotational perspective in this paper, but for the reader with this perspective, a first step is to note that the quotient set <sup>Q</sup> def = (*Trees*Σ)/<sup>≡</sup><sup>P</sup> is a <sup>Σ</sup>-algebra, where (*tr* <sup>≡</sup><sup>P</sup> *tr* ) if <sup>∀</sup><sup>P</sup> <sup>∈</sup> <sup>P</sup>, (*tr* <sup>∈</sup> <sup>P</sup> ⇐⇒ *tr* <sup>∈</sup> <sup>P</sup>); decomposability implies that (≡P) is a <sup>Σ</sup>-congruence. This thus induces a CPS monad <sup>Q</sup>(Q−) on the category of cpos.

Note that the terms in (1) above are an example of programs that are not bisimilar in EPCF but become bisimilar when translated to ECPS. This is because in ECPS bisimilarity, like contextual and logical equivalence, uses continuations to test return results. Therefore, in ECPS we cannot test for all natural numbers, like formula Φ does. This example provides an intuition of why we were able to show that all three notions of equivalence coincide, while [38] was not.

The modalities in Simpson's and Voorneveld's logic are similar to the observations from P, because they also specify shapes of effect trees. Since EPCF computations have a return value, a modality is used to *lift* a formula about the return values to a computation formula. In contrast, in the logic F observations alone suffice to specify properties of computations. From this point of view, our use of observations is closer to that found in the work of Johann et al. [17]. This use of observations also leads to a much simpler notion of decomposability (Definition 6) than that found in [38].

It can easily be shown that for the running examples of effects, F-logical equivalence induces the program equations which are usually used to axiomatize algebraic effects, for example the equations for global store from [33]. Thus our choice of observations is justified further.

A different logic for algebraic effects was proposed by Plotkin and Pretnar [35]. It has a modality for each effect operation, whereas observations in P are determined by the behaviour of effects, rather than by the syntax of their operations. Plotkin and Pretnar prove that their logic is sound for establishing several notions of program equivalence, but not complete in general. Refinement types are yet another approach to specifying the behaviour of algebraic effects, (e.g. [3]). Several monadic-based logics for computational effects have been proposed, such as [10], [29], although without the focus on contextual equivalence.

A logic describing a higher-order language with local store is the Hoare logic of Yoshida, Honda and Berger [42]. Hoare logic has also been integrated into a type system for a higher-order functional language with dependent types, in the form of Hoare type theory [27]. Although we do not yet know how to deal with local state or dependent types in the logic F, an advantage of our logic over the previous two is that we describe different algebraic effects in a uniform manner.

Another aspect worth noticing is that some (non-trivial) F-formulas are not inhabited by any program. For example, there is no function <sup>v</sup> : (()→R)→<sup>R</sup> satisfying: <sup>ψ</sup> = (() → !0...) → !1... <sup>∧</sup> (() → !1...) → !0....

Formula ψ says that, if the first operation of a continuation is output(0), this is replaced by output(1) and vice-versa. But in ECPS, one cannot check whether an argument outputs something without also causing the output observation, and so the formula is never satisfied.

However, ψ could be inhabited if we extended ECPS to allow λ-abstraction over the symbols in the effect context Σ, and allowed such symbols to be *captured* during substitution (dynamic scoping). Consider the following example in an imaginary extended ECPS where we abstract over output:

$$h = \lambda(x \colon \mathsf{nat}, k \colon (\mathop{\cdot} \mathsf{case} \ x \ \mathsf{of} \ \{ \mathtt{zero} \Rightarrow output(\overline{1}, k), \ \mathtt{succ}(y) \Rightarrow} $$

$$\mathsf{case} \ y \ \mathsf{of} \ \{ \mathtt{zero} \Rightarrow output(\overline{0}, k), \ \mathtt{succ}(z) \Rightarrow k \ () \} $$

$$v = \lambda f \colon () \to \mathsf{R}. \left( (\lambda output \colon (\mathtt{nat}, () \to \mathsf{R}) \to \mathsf{R}. \ (f \ ())) \ h \right).$$

The idea is that during reduction of (v f), the output operations in f are captured by λoutput. Thus, output(0) operations from (f ()) are replaced by output(1) and vice-versa, and all other writes are skipped; so in particular <sup>v</sup> <sup>|</sup>=<sup>F</sup> <sup>ψ</sup>. This behaviour is similar to that of *effect handlers* [36]: computation (f ()) is being handled by handler h. We leave for future work the study of handlers in ECPS and of their corresponding logic.

### **6 Concluding Remarks**

To summarize, we have studied program equivalence for a higher-order CPS language with general algebraic effects and general recursion (Sect. 2). Our main contribution is a logic F of program properties (Sect. 3) whose induced program equivalence coincides with contextual equivalence (Theorem 1; Sect. 4). Previous work on algebraic effects concentrated on logics that are sound for contextual equivalence, but not complete [35,38]. Moreover, F-logical equivalence also coincides with applicative bisimilarity for our language. We exemplified our results for nondeterminism, probabilistic choice, global store and I/O. A next step would be to consider local effects (e.g. [22,33,37,39]) or normal form bisimulation (e.g. [6]).

**Acknowledgements.** This research was supported by an EPSRC studentship, a Balliol College Jowett scholarship, and the Royal Society. We would like to thank Niels Voorneveld for pointing out example (1), Alex Simpson and Ohad Kammar for useful discussions, and the anonymous reviewers for comments and suggestions.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Equational Axiomatization of Algebras with Structure**

Stefan Milius and Henning Urbat(B)

Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany henning.urbat@fau.de

**Abstract.** This paper proposes a new category theoretic account of equationally axiomatizable classes of algebras. Our approach is wellsuited for the treatment of algebras equipped with additional computationally relevant structure, such as ordered algebras, continuous algebras, quantitative algebras, nominal algebras, or profinite algebras. Our main contributions are a generic HSP theorem and a sound and complete equational logic, which are shown to encompass numerous flavors of equational axiomizations studied in the literature.

### **1 Introduction**

A key tool in the algebraic theory of data structures is their specification by operations (constructors) and equations that they ought to satisfy. Hence, the study of models of equational specifications has been of long standing interest both in mathematics and computer science. The seminal result in this field is Birkhoff's celebrated HSP theorem [7]. It states that a class of algebras over a signature Σ is a *variety* (i.e. closed under homomorphic images, subalgebras, and products) iff it is axiomatizable by equations s = t between Σ-terms. Birkhoff also introduced a complete deduction system for reasoning about equations.

In algebraic approaches to the semantics of programming languages and computational effects, it is often natural to study algebras whose underlying sets are equipped with additional computationally relevant structure and whose operations preserve that structure. An important line of research thus concerns extensions of Birkhoff's theory of equational axiomatization beyond ordinary Σalgebras. On the syntactic level, this requires to enrich Birkhoff's notion of an equation in ways that reflect the extra structure. Let us mention a few examples:

(1) *Ordered algebras* (given by a poset and monotone operations) and *continuous algebras* (given by a complete partial order and continuous operations) were identified by the ADJ group [14] as an important tool in denotational semantics. Subsequently, Bloom [8] and Ad´amek, Nelson, and Reiterman [2,3]

S. Milius—Supported by Deutsche Forschungsgemeinschaft (DFG) under project MI 717/5-1.

H. Urbat—Supported by Deutsche Forschungsgemeinschaft (DFG) under project SCHR 1118/8-2.

c The Author(s) 2019

M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 400–417, 2019. https://doi.org/10.1007/978-3-030-17127-8\_23

established ordered versions of the HSP theorem along with complete deduction systems. Here, the role of equations s = t is taken over by inequations s ≤ t.


The present paper proposes a general category theoretic framework that allows to study classes of algebras with extra structure in a systematic way. Our overall goal is to isolate the domain-specific part of any theory of equational axiomatization from its generic core. Our framework is parametric in the following data:


Here, *A* is the category of algebras under consideration (e.g. ordered algebras, quantitative algebras, nominal algebras). Varieties are formed within *A*0, and the cardinal numbers in Λ determine the arities of products under which the varieties are closed. Thus, the choice *A*<sup>0</sup> = finite algebras and Λ = finite cardinals corresponds to pseudovarieties, and *A*<sup>0</sup> = *A* and Λ = all cardinals to varieties. The crucial ingredient of our setting is the parameter *X* , which is the class of objects over which equations are formed; thus, typically, *X* is chosen to be some class of freely generated algebras in *A* . Equations are modeled as E-quotients e: X -E (more generally, filters of such quotients) with domain X ∈ *X* .

The choice of *X* reflects the desired expressivity of equations in a given setting. Furthermore, it determines the type of quotients under which equationally

axiomatizable classes are closed. More precisely, in our general framework a *variety* is defined to be a subclass of *A*<sup>0</sup> closed under E*<sup>X</sup>* -quotients, M-subobjects, and Λ-products, where E*<sup>X</sup>* is a subclass of E derived from *X* . Due to its parametric nature, this concept of a variety is widely applicable and turns out to specialize to many interesting cases. The main result of our paper is the

**General HSP Theorem.** *A subclass of A*<sup>0</sup> *forms a variety if and only if it is axiomatizable by equations.*

In addition, we introduce a generic deduction system for equations, based on two simple proof rules (see Sect. 4), and establish a

**General Completeness Theorem.** *The generic deduction system for equations is sound and complete.*

The above two theorems can be seen as the generic building blocks of the model theory of algebras with structure. They form the common core of numerous Birkhoff-type results and give rise to a systematic recipe for deriving concrete HSP and completeness theorems in settings such as (1)–(4). In fact, all that needs to be done is to translate our abstract notion of equation and equational deduction, which involves (filters of) quotients, into an appropriate syntactic concept. This is the domain-specific task to fulfill, and usually amounts to identifying an "exactness" property for the category *A* . Subsequently, one can apply our general results to obtain HSP and completeness theorems for the type of algebras under consideration. Several instances of this approach are shown in Sect. 5. Omitted proofs and details for the examples can be found in [20].

*Related work.* Generic approaches to universal algebra have a long tradition in category theory. They aim to replace syntactic notions like terms and equations by suitable categorical abstractions, most prominently Lawvere theories and monads [4,17]. Our present work draws much of its inspiration from the classical paper of Banaschewski and Herrlich [6] on HSP classes in (E,M)-structured categories. These authors were the first to model equations as quotients e: X - E. However, their approach does not feature the parameter *X* and assumes that equations are formed over E-projective objects X. This limits the scope of their results to categories with enough projectives, a property that typically fails in categories of algebras with structure (including continuous, quantitative or nominal algebras). The identification of the parameter *X* and of the derived parameter E*<sup>X</sup>* as a key concept is thus a crucial step towards a categorical view of such structures.

Equational logics on the level of abstraction of Banaschewski and Herrlich's work were studied by Ro¸su [26,27] and Ad´amek, H´ebert, and Sousa [1]. These authors work under assumptions on the category *A* different from our framework, e.g. they require existence of pushouts. Hence, the proof rules and completeness results in *loc. cit.* are not directly comparable to our approach in Sect. 4.

In the present paper, we model equations as filters of quotients rather than single quotients, which allows us to encompass several HSP theorems for finite algebras [12,23,25]. The first categorical generalization of such results was given by Ad´amek, Chen, Milius, and Urbat [10,29] who considered algebras for a monad T on an algebraic category and modeled equations as filters of finite quotients of free T-algebras (equivalently, as profinite quotients of free profinite T-algebras). This idea was generalized by Salam´anca [28] to monads on concrete categories. However, again, this work only applies to categories with enough projectives.

#### **2 Preliminaries**

We start by recalling some notions from category theory. A *factorization system* (E,M) in a category *A* consists of two classes E,M of morphisms in *A* such that (1) both E and M contain all isomorphisms and are closed under composition, (2) every morphism f has a factorization f = m · e with e ∈ E and m ∈ M, and (3) the *diagonal fill-in* property holds: for every commutative square g · e = m· f with e ∈ E and m ∈ M, there exists a unique d with m · d = g and d · e = f. The morphisms m and e in (2) are unique up to isomorphism and are called the *image* and *coimage* of f, resp. The factorization system is *proper* if all morphisms in E are epic and all morphisms in M are monic. From now on, we will assume that *A* is a category equipped with a proper factorization system (E,M). Quotients and subobjects in *A* are taken with respect to E and M. That is, a *quotient* of an object X is represented by a morphism e: X - E in E and a *subobject* by a morphism m: M X in M. The quotients of X are ordered by e ≤ e iff e factorizes through e, i.e. there exists a morphism h with e = h · e. Identifying quotients e and e which are isomorphic (i.e. e ≤ e and e ≤ e), this makes the quotients of X a partially ordered class. Given a full subcategory *A*<sup>0</sup> ⊆ *A* we denote by X *A*<sup>0</sup> the class of all quotients of X represented by E-morphisms with codomain in *A*0. The category *A* is E*-co-wellpowered* if for every object X ∈ *A* there is only a set of quotients with domain X. In particular, X *A*<sup>0</sup> is then a po*set*. Finally, an object X ∈ *A* is called *projective* w.r.t. a morphism e: A → B if for every h: X → B, there exists a morphism g : X → A with h = e · g.

#### **3 The Generalized Variety Theorem**

In this section, we introduce our categorical notions of equation and variety, and derive the HSP theorem. Fix a category *A* with a proper factorization system (E,M), a full subcategory *A*<sup>0</sup> ⊆ *A* , a class Λ of cardinal numbers, and a class *X* ⊆ *A* of objects. An object of *A* is called *X -generated* if it is a quotient of some object in *X* . A key role will be played by the subclass E*<sup>X</sup>* ⊆ E defined by

$$\mathcal{E}\_{\mathcal{X}} = \{ e \in \mathcal{E} \; : \text{ every } X \in \mathcal{X}^\* \text{ is projective w.r.t. } e \}.$$

Note that *X* ⊆ *X* implies E*<sup>X</sup>* - ⊆ E*<sup>X</sup>* . The choice of *X* is a trade-off between "having enough equations" (that is, *X* needs to be rich enough to make equations sufficiently expressive) and "having enough projectives" (cf. (3) below).

**Assumptions 3.1.** Our data is required to satisfy the following properties:


**Example 3.2.** Throughout this section, we will use the following three running examples to illustrate our concepts. For further applications, see Sect. 5.

	- *A* = *A*<sup>0</sup> = **Alg**(Σ);
	- (E,M) = (surjective morphisms, injective morphisms);
	- Λ = all cardinal numbers;

– *X* = all free Σ-algebras TΣX with X ∈ **Set**.

One easily verifies that E*<sup>X</sup>* consists of all surjective morphisms, that is, E*<sup>X</sup>* = E.

	- *A* = **Alg**(Σ) and *A*<sup>0</sup> = **Alg**f(Σ), the full subcategory of finite Σalgebras;
	- (E,M) = (surjective morphisms, injective morphisms);
	- Λ = all finite cardinal numbers;
	- *X* = all free Σ-algebras TΣX with X ∈ **Set**f.

As in (1), the class E*<sup>X</sup>* consists of all surjective morphisms.

(3) *Quantitative* Σ*-algebras.* In recent work, Mardare, Panangaden, and Plotkin [18,19] extended Birkhoff's theory to algebras endowed with a metric. Recall that an *extended metric space* is a set A with a map d<sup>A</sup> : A × A → [0,∞] (assigning to any two points a possibly infinite distance), subject to the axioms (i) dA(a, b) = 0 iff a = b, (ii) dA(a, b) = dA(b, a), and (iii) dA(a, c) ≤ dA(a, b) + dA(b, c) for all a, b, c ∈ A. A map h: A → B between extended metric spaces is *nonexpansive* if dB(h(a), h(a )) ≤ dA(a, a ) for a, a ∈ A. Let **Met**<sup>∞</sup> denote the category of extended metric spaces and nonexpansive maps. Fix a, not necessarily finitary, signature Σ, that is, the arity of an operation symbol σ ∈ Σ is any cardinal number. A *quantitative* Σ*-algebra*

is a Σ-algebra A endowed with an extended metric d<sup>A</sup> such that all Σoperations <sup>σ</sup> : <sup>A</sup><sup>n</sup> <sup>→</sup> <sup>A</sup> are nonexpansive. Here, the product <sup>A</sup><sup>n</sup> is equipped with the sup-metric dA<sup>n</sup> ((ai)i<n,(bi)i<n) = supi<n dA(ai, bi). The forgetful functor from the category **QAlg**(Σ) of quantitative Σ-algebras and nonexpansive Σ-homomorphisms to **Met**<sup>∞</sup> has a left adjoint assigning to each space X the free quantitative Σ-algebra TΣX. The latter is carried by the set of all Σ-terms (equivalently, well-founded Σ-trees) over X, with metric inherited from X as follows: if s and t are Σ-terms of the same shape, i.e. they differ only in the variables, their distance is the supremum of the distances of the variables in corresponding positions of s and t; otherwise, it is ∞.

We aim to derive the HSP theorem for quantitative algebras proved by Mardare et al. as an instance of our general results. The theorem is parametric in a regular cardinal number c > 1. In the following, an extended metric space is called c*-clustered* if it is a coproduct of spaces of size < c. Note that coproducts in **Met**<sup>∞</sup> are formed on the level of underlying sets. Choose the parameters

– *A* = *A*<sup>0</sup> = **QAlg**(Σ);


One can verify that a quotient e: A - B belongs to E*<sup>X</sup>* if and only if for each subset B<sup>0</sup> ⊆ B of cardinality < c there exists a subset A<sup>0</sup> ⊆ A such that e[A0] = B<sup>0</sup> and the restriction e: A<sup>0</sup> → B<sup>0</sup> is isometric (that is, dB(e(a), e(a )) = dA(a, a ) for a, a ∈ A0). Following the terminology of Mardare et al., such a quotient is called c*-reflexive*. Note that for c = 2 every quotient is c-reflexive, so E*<sup>X</sup>* = E. If c is infinite, E*<sup>X</sup>* is a proper subclass of E.

**Definition 3.3.** An *equation over* X ∈ *X* is a class *T*<sup>X</sup> ⊆ X *A*<sup>0</sup> that is


An object A ∈ *A satisfies* the equation *T*<sup>X</sup> if every morphism h: X → A factorizes through some e ∈ *T*X. In this case, we write

$$A = \mathcal{P}\_{X \cdot}$$

**Remark 3.4.** In many of our applications, one can simplify the above definition and replace classes of quotients by single quotients. Specifically, if *A* is E-cowellpowered (so that every equation is a set, not a class) and Λ = all cardinal numbers, then every equation *T*<sup>X</sup> ⊆ X *A*<sup>0</sup> contains a least element e<sup>X</sup> : X - EX, viz. the lower bound of all elements in *T*X. Then an object A satisfies *T*<sup>X</sup> iff it satisfies eX, in the sense that every morphism h: X → A factorizes through eX. Therefore, in this case, one may equivalently define an equation to be a morphism e<sup>X</sup> : X - E<sup>X</sup> with X ∈ *X* . This is the concept of equation investigated by Banaschewski and Herrlich [6].

**Example 3.5.** In our running examples, we obtain the following concepts:


We shall demonstrate in Sect. 5 how to interpret the above abstract notions of equations, i.e. (filters of) quotients of free algebras, in terms of concrete syntax.

**Definition 3.6.** A *variety* is a full subcategory V ⊆ *A*<sup>0</sup> closed under E*<sup>X</sup>* quotients, subobjects, and Λ-products. More precisely,


**Example 3.7.** In our examples, we obtain the following notions of varieties:


**Construction 3.8.** Given a class E of equations, put

$$\mathcal{V}(\mathbb{E}) = \{ A \in \mathcal{A}\_0^\prime : A \doteq \mathcal{P}\_X \text{ for each } \mathcal{P}\_X \in \mathbb{E} \}.$$

A subclass V ⊆ *<sup>A</sup>*<sup>0</sup> is called *equationally presentable* if <sup>V</sup> <sup>=</sup> <sup>V</sup>(E) for some <sup>E</sup>.

We aim to show that varieties coincide with the equationally presentable classes (see Theorem 3.16 below). The "easy" part of the correspondence is established by the following lemma, which is proved by a straightforward verification.

**Lemma 3.9.** *For every class* <sup>E</sup> *of equations,* <sup>V</sup>(E) *is a variety.*

As a technical tool for establishing the general HSP theorem and the corresponding sound and complete equational logic, we introduce the following concept:

**Definition 3.10.** An *equational theory* is a family of equations

$$\mathcal{P} = (\mathcal{P}\_X \subseteq X \downarrow \mathcal{J}\_0)\_{X \in \mathcal{X}}.$$

with the following two properties (illustrated by the diagrams below):


$$\begin{array}{c} X \stackrel{\scriptstyle \forall h}{\longrightarrow} Y \\ \downarrow \\ E\_X \longmapsto E\_Y \end{array} \begin{array}{c} X \\ \downarrow \\ E\_Y \end{array} \begin{array}{c} X \\ \downarrow \\ E\_X \Longrightarrow E\_Y \end{array} \begin{array}{c} Y \\ \downarrow \\ E\_X \Longrightarrow E\_Y \end{array}$$

**Remark 3.11.** In many settings, the slightly technical concept of an equational theory can be simplified. First, note that E*<sup>X</sup>* -completeness is trivially satisfied whenever E*<sup>X</sup>* = E. If, additionally, every equation contains a least element (e.g. in the setting of Remark 3.4), an equational theory corresponds exactly to a family of quotients (e<sup>X</sup> : X - EX)<sup>X</sup>∈*<sup>X</sup>* such that E<sup>X</sup> ∈ *A*<sup>0</sup> for all X ∈ *X* , and for every h: X → Y with X, Y ∈ *X* the morphism e<sup>Y</sup> · h factorizes through eX.

**Example 3.12 (Classical** Σ**-algebras).** Recall that a *congruence* on a Σalgebra A is an equivalence relation ≡ ⊆ A × A that forms a subalgebra of A × A. It is well-known that there is an isomorphism of complete lattices

$$\text{quotient algebras of } A \quad \cong \quad \text{congruences on } A \tag{3.1}$$

assigning to a quotient e: A - B its *kernel*, given by a ≡<sup>e</sup> a iff e(a) = e(a ). Consequently, in the setting of Example 3.2(1), an equational theory – presented as a family of single quotients as in Remark 3.11 – corresponds precisely to a family of congruences (≡<sup>X</sup> ⊆ TΣX × TΣX)<sup>X</sup>∈**Set** closed under substitution, that is, for every s, t ∈ TΣX and every morphism h: TΣX → TΣY in **Alg**(Σ),

$$s \equiv\_X t \quad \text{implies} \quad h(s) \equiv\_Y h(t).$$

We saw in Lemma 3.9 that every class of equations, so in particular every equational theory *T* , yields a variety V(*T* ) consisting of all objects of *A*<sup>0</sup> that satisfy every equation in *T* . Conversely, to every variety one can associate an equational theory as follows:

**Construction 3.13.** Given a variety V, form the family of equations

$$\mathcal{F}(\mathcal{V}) = (\mathcal{P}\_X \subseteq X \downarrow \mathcal{J}\_0')\_{X \in \mathcal{X}},$$

where *T*<sup>X</sup> consists of all quotients e<sup>X</sup> : X -E<sup>X</sup> with codomain E<sup>X</sup> ∈ V.

**Lemma 3.14.** *For every variety* V*, the family T* (V) *is an equational theory.*

We are ready to state the first main result of our paper, the HSP Theorem. Given two equations *T*<sup>X</sup> and *T* <sup>X</sup> over X ∈ *X* , we put *T*<sup>X</sup> ≤ *T* <sup>X</sup> if every quotient in *T* <sup>X</sup> factorizes through some quotient in *T*X. Theories form a poset with respect to the order *T* ≤ *T* iff *T*<sup>X</sup> ≤ *T* <sup>X</sup> for all X ∈ *X* . Similarly, varieties form a poset (in fact, a complete lattice) ordered by inclusion.

**Theorem 3.15 (HSP Theorem).** *The complete lattices of equational theories and varieties are dually isomorphic. The isomorphism is given by*

> V → *T* (V) *and T* → V(*T* ).

One can recast the HSP Theorem into a more familiar form, using equations in lieu of equational theories:

**Theorem 3.16 (HSP Theorem, equational version).** *A class* V ⊆ *A*<sup>0</sup> *is equationally presentable if and only if it forms a variety.*

*Proof.* By Lemma 3.9, every equationally presentable class <sup>V</sup>(E) is a variety. Conversely, for every variety V one has V = V(*T* (V)) by Theorem 3.15, so V is presented by the equations <sup>E</sup> <sup>=</sup> {*T*<sup>X</sup> : <sup>X</sup> <sup>∈</sup> *<sup>X</sup>* } where *<sup>T</sup>* <sup>=</sup> *<sup>T</sup>* (V). 

# **4 Equational Logic**

The correspondence between theories and varieties gives rise to the second main result of our paper, a generic sound and complete deduction system for reasoning about equations. The corresponding semantic concept is the following:

**Definition 4.1.** An equation *T*<sup>X</sup> ⊆ X *A*<sup>0</sup> *semantically entails* the equation *T* <sup>Y</sup> ⊆ Y *A*<sup>0</sup> if every *A*0-object satisfying *T*<sup>X</sup> also satisfies *T* <sup>Y</sup> (that is, if V(*T*X) ⊆ V(*T*<sup>Y</sup> )). In this case, we write *T*<sup>X</sup> |= *T* Y .

The key to our proof system is a categorical formulation of term substitution:

**Definition 4.2.** Let *T*<sup>X</sup> ⊆ X *A*<sup>0</sup> be an equation over X ∈ *X* . The *substitution closure* of *T*<sup>X</sup> is the smallest theory *T* = (*T* <sup>Y</sup> )<sup>Y</sup> <sup>∈</sup>*<sup>X</sup>* such that *T*<sup>X</sup> ≤ *T* <sup>X</sup>.

The substitution closure of an equation can be computed as follows:

**Lemma 4.3.** *For every equation T*<sup>X</sup> ⊆ X *A*<sup>0</sup> *one has T* = *T* (V(*T*X))*.*

The deduction system for semantic entailment consists of two proof rules:

(Weakening) *T*<sup>X</sup> *T* <sup>X</sup> for all equations *T* <sup>X</sup> ≤ *T*<sup>X</sup> over X ∈ *X* . (Substitution) *T*<sup>X</sup> *T* <sup>Y</sup> for all equations *T*<sup>X</sup> over X ∈ *X* and all Y ∈ *X* .

Given equations *T*<sup>X</sup> and *T* <sup>Y</sup> over X and Y , respectively, we write *T*<sup>X</sup> *T* <sup>Y</sup> if *T* <sup>Y</sup> arises from *T*<sup>X</sup> by a finite chain of applications of the above rules.

**Theorem 4.4 (Completeness Theorem).** *The deduction system for semantic entailment is sound and complete: for every pair of equations T*<sup>X</sup> *and T* Y *,*

$$\mathcal{P}\_X \vdash \mathcal{P}'\_Y \quad \text{iff} \quad \mathcal{P}\_X \vdash \mathcal{P}'\_Y.$$

### **5 Applications**

In this section, we present some of the applications of our categorical results (see [20] for full details). Transferring the general HSP theorem of Sect. 3 into a concrete setting requires to perform the following four-step procedure:

**Step 1.** Instantiate the parameters *A* , (E,M), *A*0, Λ and *X* of our categorical framework, and characterize the quotients in E*<sup>X</sup>* .

**Step 2.** Establish an *exactness property* for the category *A* , i.e. a correspondence between quotients e: A - B in *A* and suitable relations between elements of A.

**Step 3.** Infer a suitable syntactic notion of equation, and prove it to be expressively equivalent to the categorical notion of equation given by Definition 3.3.

**Step 4.** Invoke Theorem 3.15 to deduce an HSP theorem.

The details of Steps 2 and 3 are application-specific, but typically straightforward. In each case, the bulk of the usual work required for establishing the HSP theorem is moved to our general categorical results and thus comes for free.

Similarly, to obtain a complete deduction system in a concrete application, it suffices to phrase the two proof rules of our generic equational logic in syntactic terms, using the correspondence of quotients and relations from Step 2; then Theorem 4.4 gives the completeness result.

#### **5.1 Classical** *Σ***-Algebras**

The classical Birkhoff theorem emerges from our general results as follows.

**Step 1.** Choose the parameters of Example 3.2(1), and recall that E*<sup>X</sup>* = E. **Step 2.** The exactness property of **Alg**(Σ) is given by the correspondence (3.1).

**Step 3.** Recall from Example 3.5(1) that equations can be presented as single quotients e: TΣX - EX. The exactness property (3.1) leads to the following classical syntactic concept: a *term equation* over a set X of variables is a pair (s, t) ∈ TΣX × TΣX, denoted as s = t. It is *satisfied* by a Σ-algebra A if for every map <sup>h</sup>: <sup>X</sup> <sup>→</sup> <sup>A</sup> we have <sup>h</sup>(s) = <sup>h</sup>(t). Here, <sup>h</sup> : <sup>T</sup>Σ<sup>X</sup> <sup>→</sup> <sup>A</sup> denotes the unique extension of h to a Σ-homomorphism. Equations and term equations are expressively equivalent in the following sense:


**Step 4.** From Theorem 3.16 and Example 3.7(1), we deduce the classical

**Theorem 5.1 (Birkhoff** [7]**).** *A class of* Σ*-algebras is a variety (i.e. closed under quotients, subalgebras, products) iff it is axiomatizable by term equations.*

Similarly, one can obtain Birkhoff's complete deduction system for term equations as an instance of Theorem 4.4; see [20, Section B.1] for details.

#### **5.2 Finite** *Σ***-Algebras**

Next, we derive Eilenberg and Sch¨utzenberger's equational characterization of pseudovarieties of algebras over a finite signature Σ using our four-step plan:

**Step 1.** Choose the parameters of Example 3.2(2), and recall that E*<sup>X</sup>* = E. **Step 2.** The exactness property of **Alg**(Σ) is given by (3.1).

**Step 3.** By Example 3.2(2), an equational theory is given by a family of filters *T*<sup>n</sup> ⊆ TΣn **Alg**f(Σ) (n<ω). The corresponding syntactic concept involves sequences (s<sup>i</sup> = ti)i<ω of term equations. We say that a finite Σ-algebra A *eventually satisfies* such a sequence if there exists i<sup>0</sup> < ω such that A satisfies all equations s<sup>i</sup> = t<sup>i</sup> with i ≥ i0. Equational theories and sequences of term equations are expressively equivalent:


$$\exists i\_0 < \omega : \forall i \ge i\_0 : \forall (g \colon T\_{\Sigma} m\_i \to T\_{\Sigma} n) : e \cdot g(s\_i) = e \cdot g(t\_i).$$

Then a finite Σ-algebra eventually satisfies (s<sup>i</sup> = ti)i<ω iff it lies in V(*T* ).

**Step 4.** The theory version of our HSP theorem (Theorem 3.16) now implies:

**Theorem 5.2 (Eilenberg-Sch¨utzenberger** [12]**).** *A class of finite* Σ*-algebras is a pseudovariety (i.e. closed under quotients, subalgebras, and finite products) iff it is axiomatizable by a sequence of term equations.*

In an alternative characterization of pseudovarieties due to Reiterman [25], where the restriction to finite signatures Σ can be dropped, sequences of term equations are replaced by the topological concept of a *profinite equation*. This result can also be derived from our general HSP theorem, see [20, Section B.4].

#### **5.3 Quantitative Algebras**

In this section, we derive an HSP theorem for quantitative algebras.

**Step 1.** Choose the parameters of Example 3.2(3). Recall that we work with fixed regular cardinal c > 1 and that E*<sup>X</sup>* consists of all c-reflexive quotients. **Step 2.** To state the exactness property of **QAlg**(Σ), recall that an *(extended) pseudometric* on a set A is a map p: A×A → [0,∞] satisfying all axioms of an extended metric except possibly the implication p(a, b)=0 ⇒ a = b. Given a quantitative Σ-algebra A, a pseudometric p on A is called a *congruence* if (i) p(a, a ) ≤ dA(a, a ) for all a, a <sup>∈</sup> <sup>A</sup>, and (ii) every <sup>Σ</sup>-operation <sup>σ</sup> : <sup>A</sup><sup>n</sup> <sup>→</sup> A (σ ∈ Σ) is nonexpansive w.r.t. p. Congruences are ordered by p ≤ q iff p(a, a ) ≤ q(a, a ) for all a, a ∈ A. There is a dual isomorphism of complete lattices

quotient algebras of A ∼= congruences on A (5.1)

mapping e: A - B to the congruence p<sup>e</sup> on A given by pe(a, b) = dB(e(a), e(b)).

**Step 3.** By Example 3.5(3), equations can be presented as single quotients e: TΣX - E, where X is a c-clustered space. The exactness property (5.1) suggests to replace equations by the following syntactic concept. A c*-clustered equation* over the set X of variables is an expression

$$x\_i =\_{\varepsilon\_i} y\_i \ (i \in I) \ \vdash \ s =\_{\varepsilon} t \tag{5.2}$$

where (i) I is a set, (ii) xi, y<sup>i</sup> ∈ X for all i ∈ I, (iii) s and t are Σ-terms over X, (iv) εi, ε ∈ [0,∞], and (v) the equivalence relation on X generated by the pairs (xi, yi) (i ∈ I) has all equivalence classes of cardinality < c. In other words, the set of variables can be partitioned into subsets of size < c such that only relations between variables in the same subset appear on the left-hand side of (5.2). A quantitative Σ-algebra A *satisfies* (5.2) if for every map h: X → A with <sup>d</sup>A(h(xi), h(yi)) <sup>≤</sup> <sup>ε</sup><sup>i</sup> for all <sup>i</sup> <sup>∈</sup> <sup>I</sup>, one has <sup>d</sup>A(h(s), h(t)) <sup>≤</sup> <sup>ε</sup>. Here <sup>h</sup> : <sup>T</sup>Σ<sup>X</sup> <sup>→</sup> <sup>A</sup> denotes the unique <sup>Σ</sup>-homomorphism extending <sup>h</sup>.

Equations and c-clustered equations are expressively equivalent:

(1) Let X be a c-clustered space, i.e. X = <sup>j</sup>∈<sup>J</sup> <sup>X</sup><sup>j</sup> with <sup>|</sup>X<sup>j</sup> <sup>|</sup> < c. Every equation e: TΣX - E induces a set of c-clustered equations over X given by

$$x =\_{\varepsilon\_{x,y}} y \; (j \in J, x, y \in X\_j) \vdash \; s =\_{\varepsilon\_{s,t}} t \quad (s, t \in T\_\Sigma X), \tag{5.3}$$

with εx,y = dX(x, y) and εs,t = dE(e(s), e(t)). It is not difficult to show that e and (5.3) are equivalent: an algebra satisfies e iff it satisfies all equations (5.3).

	- Let p the largest pseudometric on X with p(xi, yi) ≤ ε<sup>i</sup> for all i (that is, the pointwise supremum of all such pseudometrics). Form the corresponding quotient e<sup>p</sup> : X - Xp, see (5.1). It is easy to see that X<sup>p</sup> is c-clustered.

**Step 4.** From Theorem 3.16 and Example 3.7(3), we deduce the following

**Theorem 5.3 (Quantitative HSP Theorem).** *A class of quantitative* Σ*algebras is a* c*-variety (i.e. closed under* c*-reflexive quotients, subalgebras, and products) iff it is axiomatizable by* c*-clustered equations.*

The above theorem generalizes a recent result of Mardare, Panangaden, and Plotkin [19] who considered only signatures Σ with operations of finite or countably infinite arity and cardinal numbers c ≤ ℵ1. Theorem 5.3 holds without any restrictions on Σ and c. In addition to the quantitative HSP theorem, one can also derive the completeness of quantitative equational logic [18] from our general completeness theorem, see [20, Section B.5] for details.

#### **5.4 Nominal Algebras**

In this section, we derive an HSP theorem for algebras in the category **Nom** of nominal sets and equivariant maps; see Pitts [24] for the required terminology. We denote by A the countably infinite set of atoms, by Perm(A) the group of finite permutations of A, and by suppX(x) the least support of an element x of a nominal set <sup>X</sup>. Recall that <sup>X</sup> is *strong* if, for all <sup>x</sup> <sup>∈</sup> <sup>X</sup> and <sup>π</sup> <sup>∈</sup> Perm(A),

$$[\forall a \in \textsf{supp}\_X(x) : \pi(a) = a] \quad \Longleftrightarrow \quad \pi \cdot x = x. \text{!} $$

<sup>A</sup> *supported set* is a set <sup>X</sup> equipped with a map supp<sup>X</sup> : <sup>X</sup> → P<sup>f</sup> (A). A *morphism* f : X → Y of supported sets is a function with supp<sup>Y</sup> (f(x)) ⊆ suppX(x) for all x ∈ X. Every nominal set X is a supported set w.r.t. its least-support map suppX. The following lemma, whose first part is a reformulation of [21, Prop. 5.10], gives a useful description of strong nominal sets in terms of supported sets.

**Lemma 5.4.** *The forgetful functor from* **Nom** *to* **SuppSet** *has a left adjoint* F : **SuppSet** → **Nom***. The nominal sets of the form* F Y *(*Y ∈ **SuppSet***) are up to isomorphism exactly the strong nominal sets.*

Fix a finitary signature Σ. A *nominal* Σ*-algebra* is a Σ-algebra A carrying the structure of a nominal set such that all <sup>Σ</sup>-operations <sup>σ</sup> : <sup>A</sup><sup>n</sup> <sup>→</sup> <sup>A</sup> are equivariant. The forgetful functor from the category **NomAlg**(Σ) of nominal Σ-algebras and equivariant Σ-homomorphisms to **Nom** has a left adjoint assigning to each nominal set X the *free nominal* Σ*-algebra* TΣX, carried by the set of Σ-terms and with group action inherited from X. To derive a nominal HSP theorem from our general categorical results, we proceed as follows.

**Step 1.** Choose the parameters of our setting as follows:

– *A* = *A*<sup>0</sup> = **NomAlg**(Σ);


One can show that a quotient e: A - B belongs to E*<sup>X</sup>* iff it is *supportreflecting*: for every b ∈ B there exists a ∈ A with e(a) = b and suppA(a) = suppB(b).

**Step 2.** A *nominal congruence* on a nominal Σ-algebra A is a Σ-algebra congruence ≡ ⊆ A × A that forms an equivariant subset of A × A. In analogy to (3.1), there is an isomorphism of complete lattices

quotient algebras of A ∼= nominal congruences on A. (5.4)

**Step 3.** By Remark 3.4, an equation can be presented as a single quotient e: TΣX - E, where X is a strong nominal set. Equations can be described by syntactic means as follows. A *nominal* Σ*-term* over a set Y of variables is an element of <sup>T</sup>Σ(Perm(A) <sup>×</sup> <sup>Y</sup> ). Every map <sup>h</sup>: <sup>Y</sup> <sup>→</sup> <sup>A</sup> into a nominal Σ-algebra A extends to the Σ-homomorphism

$$\hat{h} = \langle T\_{\Sigma}(\text{Perm}(\mathbb{A}) \times Y) \xrightarrow{T\_{\Sigma}(\text{Perm}(\mathbb{A}) \times h)} T\_{\Sigma}(\text{Perm}(\mathbb{A}) \times A) \xrightarrow{T\_{\Sigma}(-\cdot-)} T\_{\Sigma}A \xrightarrow{id^{\sharp}} A \rangle$$

where *id* is the unique Σ-homomorphism extending the identity map id: A → A. A *nominal equation* over Y is an expression of the form

$$
\mathfrak{supp}\mathfrak{p}\_Y \vdash s = t,\tag{5.5}
$$

where supp<sup>Y</sup> : <sup>Y</sup> → P<sup>f</sup> (A) is a function and <sup>s</sup> and <sup>t</sup> are nominal <sup>Σ</sup>-terms over Y . A nominal Σ-algebra A *satisfies* the equation supp<sup>Y</sup> s = t if for every map h: Y → A with suppA(h(y)) ⊆ supp<sup>Y</sup> (y) for all y ∈ Y one has hˆ(s) = hˆ(t). Equations and nominal equations are expressively equivalent:

(1) Given an equation e: TΣX - E with X a strong nominal set, choose a supported set Y with X = F Y , and denote by η<sup>Y</sup> : Y → F Y the universal map (see Lemma 5.4). Form the nominal equations over Y given by

$$\text{supp}\_Y \vdash s = t \quad \text{(\$s, t \in T\_\Sigma\$(Perm(A) \times Y)\$ and \$e \cdot T\_\Sigma m(s) = e \cdot T\_\Sigma m(t)\$)} \tag{5.6}$$

where <sup>m</sup> is the composite Perm(A) <sup>×</sup> <sup>Y</sup> Perm(A)×η<sup>Y</sup> −−−−−−−−→ Perm(A) <sup>×</sup> <sup>X</sup> −·− −−−→ X. It is not difficult to see that a nominal Σ-algebra satisfies e iff it satisfies (5.6).


**Theorem 5.5 (Kurz and Petri¸san** [16]**).** *A class of nominal* Σ*-algebras is a variety (i.e. closed under support-reflecting quotients, subalgebras, and products) iff it is axiomatizable by nominal equations.*

For brevity and simplicity, in this section we restricted ourselves to algebras for a signature. Kurz and Petri¸san proved a more general HSP theorem for algebras over an endofunctor on **Nom** with a suitable finitary presentation. This extra generality allows to incorporate, for instance, algebras for binding signatures.

#### **5.5 Further Applications**

Let us briefly mention some additional instances of our framework, all of which are given a detailed treatment in the full arXiv paper [20].

**Ordered Algebras.** Bloom [8] proved an HSP theorem for Σ-algebras in the category of posets: a class of such algebras is closed under homomorphic images, subalgebras, and products, iff it is axiomatizable by inequations s ≤ t between Σ-terms. This result can be derived much like the unordered case in Sect. 5.1.

**Continuous Algebras.** A more intricate ordered version of Birkhoff's theorem concerns *continuous algebras*, i.e. Σ-algebras with an ω-cpo structure on their underlying set and continuous Σ-operations. Ad´amek, Nelson, and Reiterman [3] proved that a class of continuous algebras is closed under homomorphic images, subalgebras, and products, iff it axiomatizable by inequations between terms with formal suprema (e.g. σ(x) ≤ ∨i<ω ci). This result again emerges as an instance of our general HSP theorem. A somewhat curious feature of this application is that the appropriate factorization system (E,M) takes as E the class of dense morphisms, i.e. morphisms of E are not necessarily surjective. However, one has E*<sup>X</sup>* = surjections, so homomorphic images are formed in the usual sense.

**Abstract HSP Theorems.** Our results subsume several existing categorical generalizations of Birkhoff's theorem. For instance, Theorem 3.15 yields Manes' [17] correspondence between quotient monads T - T and varieties of T-algebras for any monad T on **Set**. Similarly, Banaschewski and Herrlich's [6] HSP theorem for objects in categories with enough projectives is a special case of Theorem 3.16.

# **6 Conclusions and Future Work**

We have presented a categorical approach to the model theory of algebras with additional structure. Our framework applies to a broad range of different settings and greatly simplifies the derivation of HSP-type theorems and completeness results for equational deduction systems, as the generic part of such derivations now comes for free using our Theorems 3.15, 3.16 and 4.4. There remain a number of interesting directions and open questions for future work.

As shown in Sect. 5, the key to arrive at a syntactic notion of equation lies in identifying a correspondence between quotients and suitable relations, which we informally coined "exactness". The similarity of these correspondences in our applications suggests that there should be a (possibly enriched) notion of *exact category* that covers our examples; cf. Kurz and Velebil's [15] 2-categorical view of ordered algebras. This would allow to move more work to the generic theory.

Theorem 4.4 can be used to recover several known sound and complete equational logics, but it also applies to settings where no such logic is known, for instance, a logic of profinite equations (however, cf. recent work of Almeida and Kl´ıma [5]). In each case, the challenge is to translate our two abstract proof rules into concrete syntax, which requires the identification of a syntactic equivalent of the two properties of an equational theory. While substitution invariance always translates into a syntactic substitution rule in a straightforward manner, E*<sup>X</sup>* completeness does not appear to have an obvious syntactic counterpart. In most of the cases where a concrete equational logic is known, this issue is obfuscated by the fact that one has E*<sup>X</sup>* = E, so E*<sup>X</sup>* -completeness becomes a trivial property. Finding a syntactic account of E*<sup>X</sup>* -completeness remains an open problem. One notable case where E*<sup>X</sup>* = E is the one of nominal algebras. Gabbay's work [13] does provide an HSP theorem and a sound and complete equational logic in a setting slightly different from Sect. 5.4, and it should be interesting to see whether this can be obtained as an instance of our framework.

Finally, in previous work [29] we have introduced the notion of a *profinite theory* (a special case of the equational theories in the present paper) and shown how the dual concept can be used to derive Eilenberg-type correspondences between varieties of languages and pseudovarieties of finite algebras. Our present results pave the way to an extension of this method to new settings, such as nominal sets. Indeed, a simple modification of the parameters in Sect. 5.4 yields a new HSP theorem for *orbit-finite* nominal Σ-algebras. We expect that a dualization of this result in the spirit of *loc. cit.* leads to a correspondence between varieties of data languages and varieties of orbit-finite nominal monoids, an important step towards an algebraic theory of data languages.

**Acknowledgement.** The authors would like to thank Thorsten Wißmann for insightful discussions on nominal sets.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Towards a Structural Proof Theory of Probabilistic** *µ***-Calculi**

Christophe Lucas1(B) and Matteo Mio2(B)

<sup>1</sup> ENS–Lyon, Lyon, France christophe.lucas@ens-lyon.fr <sup>2</sup> CNRS and ENS–Lyon, Lyon, France matteo.mio@ens-lyon.fr

**Abstract.** We present a structural proof system, based on the machinery of hypersequent calculi, for a simple probabilistic modal logic underlying very expressive probabilistic *µ*-calculi. We prove the soundness and completeness of the proof system with respect to an equational axiomatisation and the fundamental cut-elimination theorem.

# **1 Introduction**

Modal and temporal logics are formalisms designed to express properties of mathematical structures representing the behaviour of computing systems, such as, e.g., Kripke frames, trees and labeled transition systems. A fundamental problem regarding such logics is the *equivalence problem*: given two formulas φ and ψ, establish whether φ and ψ are semantically equivalent. For many temporal logics, including the basic modal logic K (see, e.g., [BdRV02]) and its many extensions such as the *modal* μ*-calculus* [Koz83], the equivalence problem is decidable and can be answered automatically. This is, of course, a very desirable fact. However, a fully automatic approach is not always viable due to the high complexity of the algorithms involved. An alternative and complementary approach is to use *human-aided* proof systems for constructing *formal proofs* of the desired equalities. As a concrete example, the well-known equational axioms of Boolean algebras together with two axioms for the ♦ modality:

$$
\Diamond \bot = \bot \qquad \qquad \Diamond(x \lor y) = \Diamond(x) \lor \Diamond(y)
$$

can be used to construct formal proofs of all valid equalities between formulas of modal logic using the familiar deductive rules of *equational logic* (see Definition 3). The simplicity of equational logic is a great feature of this kind of system but sometimes comes at a cost because even seemingly trivial equalities often require significant human ingenuity to be proved.<sup>1</sup> The problem lies in

c The Author(s) 2019

<sup>1</sup> Example: the law of idempotence *<sup>x</sup>*∨*<sup>x</sup>* <sup>=</sup> *<sup>x</sup>* can be derived from the standard axioms of Boolean algebras (i.e., complemented distributive lattices) as: *x*∨*x* = (*x*∨*x*)∧ = (*x* ∨ *x*) ∧ (*x* ∨ ¬*x*) = *x* ∨ (*x* ∧ ¬*x*) = *x* ∨ ⊥ = *x*.

The authors were supported by the French project ANR-16-CE25-0011 REPAS.

M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 418–435, 2019. https://doi.org/10.1007/978-3-030-17127-8\_24

the *transitivity rule* (a = b & b = c ⇒ a = c) which requires to guess, among infinitely many possibilities, an interpolant formula b to prove the equality a = c.

The field of *structural proof theory* (see [Bus98]), originated with the seminal work of Gentzen on his *sequent calculus* proof system LK for classical propositional (first-order) logic [Gen34], investigates proof systems which, roughly speaking, require less human ingenuity. The key technical result regarding the sequent calculus, the *cut-elimination theorem*, implies that when searching for a proof of a statement, only certain formulas need to be considered: the so-called *sub-formula property*. This simplifies significantly, in practice, the *proof search* endeavour. The original system LK of Gentzen has been extensively investigated and generalised and, for example, it can be extended with rules for the ♦ modality and becomes a convenient proof system for modal logic [Wan96]. Furthermore, it is possible to extend it with rules for dealing with (co)inductive definitions and it becomes a proof system for the modal μ-calculus (see, e.g., [Stu07]). Research on the structural proof theory of the modal μ-calculus is an active area of research (see, e.g., recent [Dou17]).

*Probabilistic Logics and the Riesz Modal Logic.* Probabilistic logics are temporal logics specifically designed to express properties of mathematical structures (e.g., Markov chains and Markov decision processes) representing the behaviour of computing systems using probabilistic features such as random bit generation. Unlike the non-probabilistic case, the equivalence problem for most expressive probabilistic logics (e.g., *pCTL* [LS82,HJ94], see also [BK08,BBLM17]) is not known to be decidable. Hence, human-aided proof systems are currently the only viable approach to establish equalities of formulas of expressive probabilistic logics. To the best of our knowledge, however, all the proof systems proposed in the literature (see, e.g., [DFHM16] for the logic pCTL, [BGZB09,Hsu17] for pRHL and [Koz85] for pPDL) are not entirely satisfactory because they include rules, such as the transitivity rule discussed above, violating the sub-formula property.

Another line of work on probabilistic logics has focused on *probabilistic* μ*calculi* ([MM07,HK97,DGJP00,dA03,MS17,Mio11,Mio12a,Mio14]). These logical formalisms are, similarly to Kozen's modal μ-calculus, obtained by extending a base *real-valued* modal logic with (co)inductively defined operators. Recently, in [MFM17], a base real-valued modal logic called *Riesz modal logic* (R) has been defined and a sound and complete equational axiomatisation has been obtained (see Definition 2). Importantly, the logic R extended with (co)inductively defined operators is sufficiently expressive to interpret most other probabilistic logics, including pCTL [Mio12b,Mio18,MS13a]. Hence, the Riesz modal logic appears to be a convenient base for developing the theory of probabilistic μ-calculi and, more generally, probabilistic logics.

*Contributions of This Work.* This work is a first step towards the development of the structural proof theory of probabilistic μ-calculi. We introduce a *hypersequent calculus* called MGA (read *modal* GA) for a version of the Riesz modal logic (the *scalar-free fragment*, see Sect. 2 for details) and by proving the cut-elimination theorem. Formally we prove:

**Theorem 1.** *The hypersequent calculus MGA is sound and complete with respect to the equational axioms of Fig. 1 and the CUT rule is eliminable.*

The machinery of hypersequent calculi has been introduced by Avron in [Avr87] and, independently, by Pottinger in [Pot83]. Our calculus extends the hypersequent calculus GA of Metcalfe, Olivetti and Gabbay [MOG05] (see also the book [MOG09] and the related [CM03] and [DMS18]) which is a sound and complete structural proof system for the equational theory of lattice-ordered abelian groups (axioms (1) in Fig. 1, see [Vul67] for an overview). The main contributions of this work are:


In particular, the last point above guarantees the correctness of the proofs of all our novel technical results which, as it is often the case in proof theory, involve complex and long induction arguments. Given the availability of formalised proofs, in this work we focus on illustrating the main ideas behind our arguments rather than spelling out all technical details.

*Organisation of the Paper.* In Sect. 2 we provide the necessary definitions about the Riesz modal logic from [MFM17,Mio18] and about the hypersequent calculus GA of [MOG05,MOG09]. In Sect. 3 we present our hypersequent calculus MGA and state the main theorems. In Sect. 4 we sketch the main ideas behind our proof of cut-elimination. Lastly, in Sect. 5 we discuss some directions for future work.

# **2 Technical Background**

#### **2.1 The Riesz Modal Logic and Its Scalar-free Fragment**

The Riesz modal logic R introduced in [MFM17] is a probabilistic logic for expressing properties of discrete or continuous Markov chains. We refer to [MFM17] for a detailed introduction. Here we just restrict ourselves to the purely *syntactical* aspects of this logic: its syntax and its axiomatisation.

**Definition 1 (Syntax).** *The set of formulas of the Riesz modal logic is generated by the following grammar:* φ, ψ ::= x | 0 | 1 | φ + ψ | rφ | φ ψ | φ ψ | ♦φ *where* r*, called a* scalar*, ranges over the set* R *of real numbers. We just write* −φ *in place of* (−1)φ*.*

A main result of [MFM17] is that two formulas φ and ψ are semantically equivalent if and only if the identity φ = ψ holds in all *modal Riesz spaces*.

**Definition 2.** *A modal Riesz space is an algebraic structure* R *over the signature* Σ = {0, 1, +, r,,,♦}*r*∈<sup>R</sup> *such that the following set* R *of axioms hold:*

	- *if* x ≥ 0 *then* ♦(x) ≥ 0 *(i.e.,* 0=0 ♦(x 0)*),*
	- *–* ♦(1) ≤ 1 *(i.e.,* ♦1 = ♦1 1*).*

Note that the definition of modal Riesz spaces is purely equational: all axioms of Riesz spaces (1) can be expressed equationally and so can the axioms (2) and (3). This means, by Birkoff completeness theorem, that two formulas are semantically equivalent if and only if the identity φ = ψ can be derived using the familiar deductive rules of equational logic, written as R φ = ψ.

**Definition 3 (Deductive Rules of Equational Logic).** *Rules for deriving identities from a set* A *of equational axioms:*

$$\begin{array}{llll} \frac{(t\_1 = t\_2) \in \mathcal{A}}{\mathcal{A} \vdash t\_1 = t\_2} \; Ax & \frac{\mathcal{A} \vdash t = t}{\mathcal{A} \vdash t = t} \; re\emptyset & \quad \frac{\mathcal{A} \vdash t\_2 = t\_1}{\mathcal{A} \vdash t\_1 = t\_2} \; sym & \quad \frac{\mathcal{A} \vdash t\_1 = t\_2}{\mathcal{A} \vdash C[t\_1] = C[t\_2]} \; extt \\\\ \frac{\mathcal{A} \vdash t\_1 = t\_2 \quad \mathcal{A} \vdash t\_2 = t\_3}{\mathcal{A} \vdash t\_1 = t\_3} \; trans & \quad \frac{\mathcal{A} \vdash f(\mathbf{s}, x, \mathbf{u}) = g(\mathbf{w}, x, \mathbf{z})}{\mathcal{A} \vdash f(\mathbf{s}, t, \mathbf{u}) = g(\mathbf{w}, t, \mathbf{z})} \; subst \end{array}$$

*where* C[·] *is a context and* f,g *are function symbols of the fixed signature.*

In what follows we denote with R φ ≤ ψ the judgment R φ = φ ψ. The following elementary facts from the theory of Riesz spaces (see, e.g., [LZ71, §2.12]) will be useful.

**Proposition 1.** *The following assertions hold:*

*–* R φ = ψ *iff* R φ − ψ = 0*, –* R φ = ψ *iff* - R φ ≤ ψ *and* R ψ ≤ φ *–* R r(x y) = rx ry*,* R r(x y) = rx ry*.*

The first point says that an equality φ = ψ can always be expressed as an identity with 0. The second point says that we can express equalities with inequalities and *vice versa*. The third point, together will the other axioms, implies that scalar multiplication distributes over all other operations {+,,,♦}.

 *.*

For most practical purposes (when expressing properties of probabilistic models) the scalars in the Riesz modal logic can be restricted to be rational numbers. **Definition 4 (Rational and Scalar-free formulas).** *A formula* φ *is* rational *if all its scalars are rational numbers. Similarly,* φ *is* scalar-free *if its scalars are all equal to* (−1)*. Equivalently, the set of scalar-free formulas is generated by the following grammar:* A, B ::= x | 0 | 1 | A + B | −A | A B | A B | ♦(A)*.*

Note how we have switched to the letters A and B to range over scalar-free formulas to highlight this distinction.

**Proposition 2.** *Let* φ *be a rational formula. Then there exists a scalar-free formula* A *such that* R φ = 0 *iff* R A = 0*.*

*Proof.* Let {r*i*}*<sup>i</sup>*∈*<sup>I</sup>* be the list of rational scalars in <sup>φ</sup>, with <sup>r</sup>*<sup>i</sup>* <sup>=</sup> *<sup>n</sup><sup>i</sup> <sup>m</sup><sup>i</sup>* and let d = *<sup>i</sup>* m*<sup>i</sup>* be the product of all denominators. Since scalar multiplication distributes with all operations it is easy to show that R dφ = ψ, for a formula ψ whose scalars are all integers. We can then obtain A from ψ by inductively replacing any sub-formula of ψ the form nB with (B + B + ··· + B) (n times) if n is positive, with −(B + B + ··· + B) if n is negative and with 0 if n = 0.

For this reason in this work we restrict attention to scalar-free formulas and we consider the restricted set of axioms T of Fig. 1. The axioms of Riesz spaces, when scalar multiplication is omitted, reduce to the axioms of *lattice ordered abelian groups* (see, e.g., [Vul67]). The axiom 0 ≤ 1 is unaltered and the axioms for the ♦ modality are naturally adapted. For these reasons we refer to these axioms as of those of *lattice-ordered modal abelian groups*.

**Fig. 1.** Set of axioms <sup>T</sup> of lattice-ordered modal Abelian groups.

*Remark 1.* Note that from the previous discussion it does not follow directly that R A = B implies T A = B. We indeed conjecture that R is a conservative extension of T but we have not proved this fact so far. In any case, this is not required for results of this work.

The main contribution of this work is the design of a sound and complete hypersequent calculus for the theory T and the proof of cut-elimination.

#### **2.2 The Hypersequent Calculus GA**

Our starting point is the hypersequent calculus GA of [MOG05,MOG09] for the theory of lattice-ordered abelian groups (set of axioms (1) in Fig. 1).

**Definition 5 (Formulas, Sequents and hypersequents).** *A* formula A *is a term built from a set of variables (ranged over by* x, y, z*) over the signature* {0, +, −,,}*. A* sequent S *is a pair of two (possibly empty) multisets of formulas* Γ = A0,...,A*<sup>n</sup> and* Δ = B0,...,B*m, denoted as* Γ Δ*. A* hypersequent G *is a nonempty multiset* S1,...,S*<sup>n</sup> of sequents, denoted as* S1| ... |S*n.*

Following [MOG05,MOG09], with some abuse of notation, we denote with S both the sequent and the hypersequent consisting of only the sequent S. The system GA is a deductive system for deriving hypersequents consisting of the rules of Fig. 2. The system GA without the CUT rule is denoted by GA∗.

Another convention we adopt from [MOG05,MOG09] is to write d GA G to express the fact that d is a valid GA-derivation of the hypersequent G. We write GA G to express the existence of a GA-derivation d such that d GA G. Similarly, we write d GA<sup>∗</sup> G and GA<sup>∗</sup> G when referring to the subsystem GA∗.


**Fig. 2.** Inference rules of the hypersequent system GA of [MOG05].

Multisets of formulas, sequents and hypersequents are interpreted as a single formula as follows:

**Definition 6 (Interpretation).** *A multiset of formulas* Γ = φ1,...,φ*<sup>n</sup> is interpreted as the formula* -Γ = φ<sup>1</sup> + φ<sup>1</sup> + ··· + φ*<sup>n</sup> if* n ≥ 1 *and as* -Γ = 0 *if* Γ = ∅*. A sequent* S = Γ Δ *is interpreted as the formula* -S = -Δ − -Γ*. Finally, a hypersequent* G = S<sup>0</sup> | ··· | S*<sup>n</sup> is interpreted as the formula* -G = -S0 ··· -S*n.*

*Example 1.* Consider the hypersequent G = - 0x, y y | - −y consisting of two sequents. Then -G = - y − - (0 x) + y - 0 − (−y) .

The soundness and completeness of the hypersequent system GA with respect to the theory of lattice-ordered abelian groups (axioms (1) of Fig. 1, written as T(1)) is expressed by the following theorem.

**Theorem 2 (**[MOG05]**).** *For all formulas* A *and hypersequents* G*:*

*Soundness: if GA* G *then* T(1) -G ≥ 0*. Completeness: if* T(1) A ≥ 0 *then GA* ( A)

*Proof.* The proofs presented in [MOG05] exploit the following well-known fact (see, e.g., [Vul67]): the equality A = B holds in all lattice-ordered abelian groups if and only if it holds in (R, <sup>0</sup>, <sup>+</sup>, <sup>−</sup>, max, min) under any interpretation of the variables as real numbers. In other words, R generates the variety of latticeordered abelian groups.

The main result of [MOG05] regarding GA is that the CUT rule is eliminable.

**Theorem 3 (Cut-elimination** [MOG05]**).** *Any GA-derivation of a hypersequent* G *can be effectively transformed into a GA*∗*-derivation of* G*.*

### **3 The Hypersequent System MGA**

In this section we introduce our hypersequent calculus system MGA, a modal extension of the GA system of [MOG05]. The system MGA deals with formulas over the signature of modal lattice-ordered abelian groups (see Fig. 1) thus including the constant 1 and the unary modality ♦.

**Definition 7 (Formulas of MGA).** *A* formula A *is a term built from a set of variables (ranged over by* x, y, z*) over the signature* {0, 1, +, −,,,♦}*.*

The definitions of sequents and hypersequents are given exactly as for the system GA in Definition 5 of Sect. 2.2. Similarly, multisets of formulas, sequents and hypersequents are interpreted as formulas exactly as already specified in Definition 6 of Sect. 2.2 for the system GA. Before presenting the deduction rules of MGA, it is useful to introduce the following abbreviations.


The rules of the system MGA consist of all rules of GA (see Fig. 2) together with the additional rules of Fig. 3.


**Fig. 3.** Additional inference rules of the hypersequent system MGA

The axiom (1-ax) for the constant 1 is straightforward and it simply expresses the axiom 0 ≤ 1 from Fig. 1 (i.e., T -1 ≥ 0).

The rule (♦-rule) for the modality is more subtle as it imposes strong constraints on the shape of its premise and conclusion. First, both the conclusion and the premise are required to be hypersequents consisting of exactly one sequent. Furthermore, in the conclusion, all formulas, except those of the form 1 on the right side, need to be of the form ♦C for some C.

The following is an illustrative example of derivation in the system MGA:

<sup>1</sup> <sup>1</sup> ID-ax <sup>A</sup> <sup>A</sup> ID-ax A, <sup>1</sup> <sup>1</sup>, A <sup>M</sup> A, <sup>1</sup>,−(A) <sup>1</sup> <sup>−</sup>*<sup>L</sup>* A,<sup>1</sup> <sup>−</sup> <sup>A</sup> <sup>1</sup> <sup>+</sup>*<sup>L</sup>* A, A <sup>1</sup> <sup>|</sup> A, <sup>1</sup> <sup>−</sup> <sup>A</sup> <sup>1</sup> <sup>W</sup> A,A (1 <sup>−</sup> <sup>A</sup>) <sup>1</sup> *<sup>L</sup>* A, A (1 <sup>−</sup> <sup>A</sup>) <sup>1</sup> <sup>|</sup> <sup>1</sup> <sup>−</sup> A, A (1 <sup>−</sup> <sup>A</sup>) <sup>1</sup> <sup>W</sup> <sup>A</sup> (1 <sup>−</sup> <sup>A</sup>), A (1 <sup>−</sup> <sup>A</sup>) <sup>1</sup> *<sup>L</sup>* ♦((<sup>A</sup> (1 <sup>−</sup> <sup>A</sup>))),♦((<sup>A</sup> (1 <sup>−</sup> <sup>A</sup>))) <sup>1</sup> ♦-rule ♦((A (1 − A))) + ♦((A (1 − A))) 1 +*<sup>L</sup>*

Our first theorem regarding MGA states its soundness and completeness with respect to the theory of modal lattice-ordered abelian groups (see Fig. 1). The proof of [MOG05] of Theorem 2 cannot be directly adapted here because, unlike the case for lattice-ordered abelian groups and R, we are not aware of any simple modal lattice-order abelian group which generates the whole variety.

**Theorem 4.** *For all formulas* A *and hypersequents* G*:*

*Soundness: if MGA* G *then* T -G ≥ 0*.*

*Completeness: if* T A ≥ 0 *then MGA* ( A)*.*

*Proof.* Soundness is proven by translating every MGA derivation d of G to a derivation in equational logic π of -G ≥ 0. This is done by induction on the complexity of d. The difficult cases correspond to when d ends by applications of either the S-rule, the M-rule or the *<sup>L</sup>* rule. The formalised proof is implemented in the agda file Syntax/Agda/MGA-Cut/Soundness.agda in [Agd] and the type of the function is: soundness : (G : HSeq) → (MGA G) → botAG ≤S -G .

Conversely, completeness is proven by translating every equational logic derivation π of A = B to the MGA derivations d<sup>1</sup> and d<sup>2</sup> of the (hyper)sequents A B and B A respectively. The proof goes by induction on π. First, MGA derivations are obtained for all axioms of Fig. 1. For example, for the axiom ♦(x+y) = ♦(x) +♦(y) we can derive the (hyper)sequent ♦(x+y) ♦(x) +♦(y) as showed below (left-side). Translating applications of the rules *refl* and *sym* is simple. Translating applications of the *trans* rules is immediate using the *CUT* rule of MGA. To translate applications of the *ctxt* rule, it is sufficient to prove (by induction) a simple context-lemma that states that if A B is MGA derivable then also C[A] C[B] is MGA derivable. Similarly, to translate applications of the *subst* rule, it is sufficient to prove (by induction) a simple substitutionlemma stating that if G is MGA derivable then G[A/x] is also derivable, where G[A/x] is the hypersequent where every occurrence of x is replaced by A.

Note that T A ≥ 0 means that T 0=0 A. By the translation method outlined above, the (hyper)sequent 0 0 A is MGA derivable. We can then get a MGA derivation of A as follows (right-side):

$$\begin{array}{|c|c|c|}
\hline
\text{ } & \overline{x \vdash x} & \text{ID-ax} & & & & \\
\hline
\text{ } & \overline{x, y \vdash x, y} & \text{ID-ax} & & & & \\
\hline
\text{ } & \overline{x + y \vdash x, y} & \text{ $+$  & } & & & \\
\hline
\text{ } & \overline{\diamondsuit (x + y) \vdash \diamondsuit (x), \diamondsuit (y)} & \text{ $+$  & } & & & & \\
\hline
\text{ } & \overline{\diamondsuit (x + y) \vdash \diamondsuit (x) \to \diamondsuit (y)} & \text{ $+$  & } & & & & \\
\hline
\text{ } & \overline{\diamondsuit A \mid A \vdash A} & \text{ $\top\_{L}$ } & \text{ $\perp\_{L}$ } & \underline{\text{ $0 \vdash } 0 \sqcap \text{$ A $}} & \underline{\text{$ 0 \vdash } 0 \sqcap \text{ $A$ }} & \text{cut} \\
\hline
\text{ } & \underline{\text{ $\perp$  }} & \underline{\text{ $A$ }} & \text{ $\perp\_{L}$ } & \text{cut} & \\
\hline
\end{array}$$

The file Syntax/Agda/MGA-Cut/Completeness.agda in [Agd] contains the formalised proof and the type of the function is: completeness : (A : Term) → botAG ≤S A → MGA (head ([ ], [ ] :: A)).

*Remark 2.* The following natural looking variant of the (♦-rule), allowing hypersequents with more than one component, is unsound:

$$\frac{G \mid F \vdash \Delta, n1}{G \mid \Diamond F \vdash \Diamond \Delta, n1}$$

Our main theorem regarding the system MGA is the cut-elimination theorem. We denote with MGA<sup>∗</sup> the system without the CUT rule.

**Theorem 5 (Cut-elimination).** *Any MGA-derivation of a hypersequent* G *can be effectively transformed into a* MGA∗*-derivation of* G*.*

Theorems 4 and 5 imply the statement of Theorem 1 in the Introduction.

#### **4 Overview of the Proof of the Cut-Elimination Theorem**

In this section we illustrate the structure of our proof of the cut-elimination theorem. We first explain the main ideas behind the proof of cut-elimination for GA of [MOG09, §5.2]. We then explain why these idea are not directly applicable to the system MGA. Lastly, we discuss our key technical contribution which makes it possible to adapt the proof method of [MOG09, §5.2] to prove the cut-elimination theorem for the MGA system.

#### **4.1 The CAN-Elimination Theorem for the System GA**

A key idea of [MOG09, §5.2] is to replace the CUT rule with an easier to handle rule called *cancellation* (CAN) rule. The CAN rule can derive the CUT rule in the basic cut-free system GA<sup>∗</sup> as follows (right-side):

$$\begin{array}{c|c} G|\varGamma, A \vdash A, \Delta \\ \hline G|\varGamma \vdash \Delta \end{array} \text{CAN} \qquad\qquad\qquad\qquad\qquad \begin{array}{c|c} \frac{d\_1}{G|\varGamma\_1, A \vdash \Delta\_1} \quad \frac{d\_2}{G|\varGamma\_2 \vdash A, \Delta\_2} \\ \frac{G|\varGamma\_1, \varGamma\_2, A \vdash A, \Delta\_1, \Delta\_2}{G|\varGamma\_1, \varGamma\_2 \vdash \Delta\_1, \Delta\_2} \text{CAN} \end{array}$$

The cut-elimination theorem is obtained in [MOG09, §5.2] by proving a CANelimination theorem expressed as: if GA<sup>∗</sup> G|Γ, A A, Δ then GA<sup>∗</sup> G|Γ Δ.

The CAN-elimination theorem for the system GA is proved in three steps:

*Step A: proving the invertibility of all the logical rules* ([MOG09, Lemma 5.18]). The invertibility states that if the conclusion of a logical rule (for instance, G|Γ, A + B Δ for the +*<sup>L</sup>* rule) is derivable without the CAN-rule, then all the premises (in this case G|Γ, A, B Δ) are derivable too without the CAN-rule.

*Step B: proving the atomic CAN-elimination theorem* ([MOG09, Lemma 5.17]). This theorem deals with the special case of A being a variable and states that if d GA<sup>∗</sup> G|Γ, x x, Δ then GA<sup>∗</sup> G|Γ Δ. This theorem is proven by induction on d and is mostly straightforward: the only difficult case is when d finishes with an application of the M-rule. A separate technical result ([MOG09, Lemma 5.16]) is used to take care of this difficult case.

*Step C: proving the CAN-elimination theorem* ([MOG09, Theorem 5.19]). The CAN-elimination theorem states that if GA<sup>∗</sup> G|Γ, A A, Δ then GA<sup>∗</sup> G|Γ Δ. This proof is by induction on A:


#### **4.2 Issues in Adapting the Proof for the System MGA**

The proofs of [MOG09] can be adapted to the context of MGA without much difficulty to perform the first two steps:

**Theorem 6 (Invertibility of the logical rules).** *All logical rules (including the* ♦*-rule) are invertible in the system MGA*∗*.*

*Proof.* The same proof technique used in [MOG09] works. The main idea is, in order to deal easily with the (S) and the (C) rules, to prove a slightly stronger statement about the invertibility of more general rules. For instance, the generalisation of the rule +*<sup>L</sup>* is:

$$\frac{\left[\Gamma\_{i}, n\_{i}A, n\_{i}B \vdash \Delta\_{i}\right]\_{i=1}^{k}}{\left[\Gamma\_{i}, n\_{i}(A+B) \vdash \Delta\_{i}\right]\_{i=1}^{n}}\tag{7}$$

**Theorem 7 (Atomic CAN-elimination theorem).** *If MGA*<sup>∗</sup> Γ, x x, Δ *then MGA*<sup>∗</sup> Γ Δ*.*

The complication comes from the third and last Step C. We want to prove that if MGA<sup>∗</sup> G|Γ, A A, Δ then MGA<sup>∗</sup> G|Γ Δ. An ordinary proof by induction on A could get stuck when A = ♦B. For instance, if the hypersequent is x,♦B ♦B, x, the invertibility of the ♦-rule can not be used because of the syntactic constraints the ♦-rule imposes on its conclusion. Indeed the invertibility of the ♦-rule states that if MGA<sup>∗</sup> ♦Γ ♦Δ then MGA<sup>∗</sup> Γ Δ, but x,♦A ♦A, x is not of this form because it contains the variable x.

For this reason, we deal with the case A = ♦B in a different way, using an induction argument on the derivation of G|Γ, A A, Δ. In this argument, however, the M-rule is hard to deal with (as already remarked it is a main source of complications also on the proof of atomic CAN-elimination of [MOG09, §5.2]).

Our main technical result is that the M-rule can be eliminated from a simple variant of the system MGA called MGA-SR (which stands MGA with *scalar rules*). The system MGA-SR is obtained by modifying MGA as follows:

– The logical left-rules and right-rules for the connectives {0, −, +,,} are generalised to deal with scalar coefficients (syntactic sugaring introduced in Sect. 3). For instance, the rules +*<sup>L</sup>* and *<sup>L</sup>* become:

$$\frac{G \mid \Gamma, nA, nB \vdash \Delta}{G \mid \Gamma, n(A+B) \vdash \Delta} +\_L \quad \frac{G \mid \Gamma, nA \vdash \Delta \quad G \mid \Gamma, nB \vdash \Delta}{G \mid \Gamma, n(A \sqcup B) \vdash \Delta} \quad \sqcup\_L$$

– The axioms ID-ax and 1-ax are replaced by the rules

$$\frac{G|\varGamma \vdash \Delta}{G|\varGamma, nA \vdash nA, \Delta} \text{ ID-rule} \quad \frac{G \mid \varGamma \vdash \Delta}{G \mid \varGamma \vdash \Delta, n1} \text{ 1-rule}$$

– All structural rules (C, W, S, M), the ♦-rule and the CAN rule remain exactly as in MGA (see Fig. 2).

It is possible to verify that MGR and MGR-SR are equivalent, i.e., they can derive exactly the same hypersequents (Theorem 8 below). The first modification (scalar rules) is technically motivated because it simplifies several proofs: in fact scalar rules are also implicitly considered in several of the proofs of [MOG09] for the system GA. The second modification (ID-rule and 1-rule) is essential. Indeed in the system MGA (and also in GA) the (hyper)sequent x, y x, y is not derivable without applying the M-rule. Hence M-elimination in MGA is impossible. On the other hand the (hyper)sequent x, y x, y is easily derivable in MGA-SR without requiring applications of the M rule

$$\frac{\vdash \Delta \text{-ax}}{y \vdash y} \text{ ID-rule}$$

and, as we will prove (Theorem 12), it is indeed possible to eliminate all applications of the M-rule from MGA-SR.

As outlined above, the presence of the M-rule was the main source of complications in adapting Step C. Once the equivalence between MGA-SR and MGA-SR without the M-rule is established, most complications disappear and the CAN-elimination proof can be obtained by performing Steps A–B–C for the system MGA-SR.

#### **4.3 The System MGA-SR and the M-Elimination Theorem**

In this subsection we introduce the system MGA-SR (MGA with *scalar rules*) for which we will prove the M-elimination theorem.

**Definition 8 (MGA-SR).** *The inference rules of MGA-SR are the rules of MGA modified as discussed previously. We denote by MGA-SR*∗*, MGA-SR*† *and MGA-SR*†∗ *the systems without the CUT rule, the M-rule and both the CUT and M-rules, respectively.*

**Theorem 8.** *The two systems MGA and MGA-SR are equivalent: MGA* G *if and only if MGA*−*SR* G*.*

*The two systems MGA*<sup>∗</sup> *and MGA-SR*<sup>∗</sup> *are equivalent: MGA*<sup>∗</sup> G *if and only if MGA*−*SR*<sup>∗</sup> G*.*

*Proof.* Translating MGA proofs to MGA-SR proofs is straightforward. All rules of MGA are specific instances of the scalar rules of MGA-SR (taking the scalar n = 1) and the the axioms 1-Axiom and ID-axioms are easily derivable in MGA-SR (without the need of the CAN rule) by using the id-rule and 1-rule (again, using the scalar n = 1). Translating MGA-SR to MGA is also mostly straightforward. Some care is needed to translate instances of the scalar-rules *<sup>L</sup>* and *<sup>R</sup>* from MGA-SR to MGA. This can be done by induction on the scalar n using the fact that the two premises G|Γ, nA, B Δ and G|Γ, nB, A Δ are derivable from G|Γ,(n + 1)A Δ and G|Γ,(n + 1)B Δ. We remark that this derivation may require the usage of the M rule.

We now state our main technical contribution: the M-elimination theorem for the system MGA-SR.

**Theorem 9 (M-elimination).** *If* d<sup>1</sup> *MGA-SR*† G<sup>1</sup> | Γ Δ *and* d<sup>2</sup> *MGA-SR*† G<sup>2</sup> | Σ Π *then MGA-SR*† G<sup>1</sup> | G<sup>2</sup> | Γ, Σ Δ, Π*. If* d<sup>1</sup> *MGA-SR*†∗ G<sup>1</sup> | Γ Δ *and* d<sup>2</sup> *MGA-SR*†∗ G<sup>2</sup> | Σ Π *then MGA-SR*†∗ G<sup>1</sup> | G<sup>2</sup> | Γ, Σ Δ, Π*.*

We now give a sketch of our proof argument. A formalised proof in Agda is available in [Agd] and is contained in the files Syntax/MGA-SR/M-Elim.agda and Syntax/MGA-SR-CAN/M-Elim-CAN.agda.

The general idea is to combine the derivations d<sup>1</sup> and d<sup>2</sup> in a *sequential way*. We first consider the case when no applications of the ♦-rule appear in d<sup>1</sup> nor d2. First the proof d<sup>1</sup> is transformed into a pre-proof (i.e., where the derivation is left incomplete at some leaves) d <sup>1</sup> of G<sup>1</sup> | G<sup>2</sup> | Γ, Σ Δ, Π. The pre-proof d 1 is structurally identically to d<sup>1</sup> and it essentially just ignores the G2, Σ and Π components of the hypersequent. While the leaves of d<sup>1</sup> are all of the form () because Δ-ax is the only axiom of MGA-SR, the leaves of the pre-proof d <sup>1</sup> are of the form G<sup>2</sup> | nΣ nΠ (the ignored part carried out until the end, which can get multiplied by applications of the C and S rules). We can now proceed with the second step and provide derivations for these leaves using (easily modified versions of) the proof d2.

When occurrences of the ♦-rule appear in d<sup>1</sup> or d<sup>2</sup> the argument requires more care. Indeed an application of the ♦-rule on d<sup>1</sup> acting on some hypersequent (necessarily) of the form:

$$
\Diamond \Gamma\_1 \vdash \Diamond \Delta\_1, k1
$$

cannot turned into an application of ♦-rule on:

$$G\_2 \mid \Sigma, \Diamond \varGamma\_1 \vdash \Diamond \Delta\_1, k\_1, H$$

because this hypersequent violates the structural constraints of the ♦-rule. For this reason, we stop the construction of d <sup>1</sup> at these points and, as a results, the leaves of the pre-proof d <sup>1</sup> are generally of the form: G<sup>2</sup> | nΣ,♦Γ<sup>1</sup> ♦Δ1, k1, nΠ, for some Γ1, Δ<sup>1</sup> and scalars n, k.

The idea now is, following the same kind of procedure, to modify the proof d<sup>2</sup> and turn it to a pre-proof d <sup>2</sup> of G<sup>2</sup> | nΣ,♦Γ<sup>1</sup> ♦Δ1, k1, nΠ. Crucially, the previous issue disappears. Indeed proof steps in d<sup>2</sup> acting on hypersequents of the form:

$$
\Diamond \Sigma\_1 \vdash \Diamond \Pi\_1, m 1
$$

using the ♦-rule, can be turned into valid ♦-rule steps for the extended hypersequent:

$$
\Diamond \Sigma\_1, \Diamond \Gamma\_1 \vdash \Diamond \Delta\_1, k 1, \Diamond \Pi\_1, m\_1 1.
$$

because the shape of the sequent is compatible with the constraint of the ♦ rule. Note that the hypersequent resulting from the application of the ♦-rule is Σ1, Γ<sup>1</sup> Γ1, k11, Π1, m11 and has a lower modal-depth than the starting one. Hence an inductive argument on modal-complexity can be arranged to recursively reduce the general M-elimination procedure to the simpler case where d<sup>1</sup> and d<sup>2</sup> do not have occurrences of the ♦-rule (Fig. 4).

**Fig. 4.** Sequentially composing *<sup>d</sup>*<sup>1</sup> and *<sup>d</sup>*<sup>2</sup> in the M-elimination proof.

The following is a direct consequence Theorems 8 and 9.

**Corollary 1.** *The two systems MGA and MGA-SR*† *are equivalent: MGA* G *if and only if MGA*−*SR*† G*.*

*The two systems MGA*<sup>∗</sup> *and MGA-SR*†∗ *are equivalent: MGA*<sup>∗</sup> G *if and only if MGA*−*SR*†∗ G*.*

#### **4.4 Cut-Elimination Theorem for the System MGA**

We have already remarked that the cut-elimination theorem for the system MGA follows from the CAN-elimination theorem. By Corollary 1, the CAN-elimination theorem for the system MGA-SR† implies the CAN-elimination for MGA. Since there is no M-rule in MGA-SR†, the proof of CAN-elimination can follow the three Steps A–B–C outlined in Subsect. 4.1. As for Step A, we need to prove the invertibility of the logical rules in the system MGA-SR†∗.

**Theorem 10 (Invertibility of the logical rules).** *The logical rules of the system MGA-SR*†∗*,* {0*L*, 0*R*, +*L*, +*R*,*L*,*R*,*L*,*R*}*, are invertible.*

*Remark 3.* We note that, just as in [MOG09, §5.2], it is in fact possible and indeed technically useful to prove the invertibility of generalised scalar rules dealing with scalar rules, as in the proof of Theorem 6.

As for Step B we prove the atomic CAN-elimination theorem. Following the previous remark, we prove the following stronger version of the statement.

**Theorem 11 (Atomic CAN-elimination).** *If MGA-SR*†∗ [Γ*i*, k*i*x k*i*x, Δ*i*] *n <sup>i</sup>*=1 *then MGA-SR*†∗ [Γ*<sup>i</sup>* Δ*i*] *n <sup>i</sup>*=1*.*

Since we removed the M-rule, there are no significant difficulties in the induction arguments, and the proof is quite straightforward.

We also need a technical lemma regarding the constant formula 1 which is provable by a simple induction on the length of derivations.

**Lemma 1.** *If MGA-SR*†∗ [Γ*i*, n*i*1 n*i*1, Δ*i*] *n <sup>i</sup>*=1 *then MGA-SR*†∗ [Γ*<sup>i</sup>* Δ*i*] *n <sup>i</sup>*=1*.*

We can now prove the CAN-elimination theorem for MGA-SR† . This, together with Corollary 1 implies the cut-elimination (Theorem 5) for MGA.

**Theorem 12 (CAN-elimination).** *If* d *MGA-SR*†∗ G | Γ, A A, Δ *then MGA-SR*†∗ G | Γ Δ*.*

*Proof.* Again, it is convenient to prove the stronger statement: If d MGA-SR†∗ [Γ*i*, k*i*A , k*i*A, Δ*i*] *n <sup>i</sup>*=1 then MGA-SR†∗ [Γ*<sup>i</sup>* Δ*i*] *n <sup>i</sup>*=1. This is done by induction on the (lexicographical) complexity of the pair (A, d):

	- If d finished with the ♦-rule, then the end hypersequent is necessarily of the form: [Γ*i*, k*i*A , k*i*A, Δ*i*] *n <sup>i</sup>*=1 = ♦Γ1, n1♦B n1♦B,♦Δ1, k1, and is derived from the hypersequent MGA-SR†∗ Γ1, n1B n1B,Δ1, k1. By induction hypotheses (B has smaller complexity than A), we have that MGA-SR†∗ Γ<sup>1</sup> Δ1, k1. Hence we can derive MGA-SR†∗ ♦Γ<sup>1</sup> ♦Δ1, k1 by application of the ♦-rule.
	- Otherwise, the hypersequent is derived by application of some other rule (not active on A = ♦B) from some premises. In this case, we simply apply the inductive hypothesis on the premises (the formula A is unchanged but the complexity of the premise derivation has decreased) and use the same rule to construct a derivation of the desired hypersequent.

#### **5 Conclusions and Future Work**

We have presented a structural proof system called MGA for the scalar-free fragment of the Riesz modal logic. A natural direction of research is to extend the system MGA to deal with the full Riesz modal logic, thus handling arbitrary scalars <sup>r</sup> <sup>∈</sup> <sup>R</sup>. The (integer-)scalar rules of the system MGA-SR could be naturally generalised to handle real-scalars but it is not clear, at the present moment, if the resulting system would satisfy a reasonable formulation of the sub-formula property. Another interesting topic of research is to consider extensions of MGA for fixed-point extensions of the Riesz modal logic (e.g., [MS17,Mio18]). In this direction, the machinery of *cyclic proofs* (see, e.g., [Stu07,MS13b,BS11,Dou17]) appears to be particularly promising.

#### **References**

	- [BK08] Baier, C., Katoen, J.P.: Principles of Model Checking. The MIT Press, Cambridge (2008)
	- [BS11] Brotherston, J., Simpson, A.: Sequent calculi for induction and infinite descent. J. Log. Comput. **21**(6), 1177–1216 (2011)
	- [Bus98] Buss, S.R.: An introduction to proof theory. In: Handbook of Proof Theory, pp. 1–78. Elsevier (1998)
	- [CM03] Ciabattoni, A., Metcalfe, G.: Bounded -Lukasiewicz logics. In: Cialdea Mayer, M., Pirri, F. (eds.) TABLEAUX 2003. LNCS (LNAI), vol. 2796, pp. 32–47. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3- 540-45206-5 6
	- [dA03] Alfaro, L.: Quantitative verification and control via the *µ*-calculus. In: Amadio, R., Lugiez, D. (eds.) CONCUR 2003. LNCS, vol. 2761, pp. 103–127. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540- 45187-7 7
	- [Mio11] Mio, M.: Probabilistic modal *µ*-calculus with independent product. In: Hofmann, M. (ed.) FoSSaCS 2011. LNCS, vol. 6604, pp. 290–304. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19805-2 20

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Partial and Conditional Expectations in Markov Decision Processes with Integer Weights**

Jakob Piribauer(B) and Christel Baier

Technische Universit¨at Dresden, Dresden, Germany *{*jakob.piribauer,christel.baier*}*@tu-dresden.de

**Abstract.** The paper addresses two variants of the stochastic shortest path problem ("optimize the accumulated weight until reaching a goal state") in Markov decision processes (MDPs) with integer weights. The first variant optimizes partial expected accumulated weights, where paths not leading to a goal state are assigned weight 0, while the second variant considers conditional expected accumulated weights, where the probability mass is redistributed to paths reaching the goal. Both variants constitute useful approaches to the analysis of systems without guarantees on the occurrence of an event of interest (reaching a goal state), but have only been studied in structures with non-negative weights. Our main results are as follows. There are polynomial-time algorithms to check the finiteness of the supremum of the partial or conditional expectations in MDPs with arbitrary integer weights. If finite, then optimal weight-based deterministic schedulers exist. In contrast to the setting of non-negative weights, optimal schedulers can need infinite memory and their value can be irrational. However, the optimal value can be approximated up to an absolute error of in time exponential in the size of the MDP and polynomial in log(1/-).

### **1 Introduction**

Stochastic shortest path (SSP) problems generalize the shortest path problem on graphs with weighted edges. The SSP problem is formalized using finite state Markov decision processes (MDPs), which are a prominent model combining probabilistic and nondeterministic choices. In each state of an MDP, one is allowed to choose nondeterministically from a set of actions, each of them is augmented with probability distributions over the successor states and a weight (cost or reward). The SSP problem asks for a policy to choose actions (here called a scheduler) maximizing or minimizing the expected accumulated weight until reaching a goal state. In the classical setting, one seeks an optimal *proper* scheduler where proper means that a goal state is reached almost surely. Polynomialtime solutions exist exploiting the fact that optimal memoryless deterministic

The authors are supported by the DFG through the Research Training Group QuantLA (GRK 1763), the DFG-project BA-1679/11-1, the Collaborative Research Center HAEC (SFB 912), and the cluster of excellence CeTI.

schedulers exist (provided the optimal value is finite) and can be computed using linear programming techniques, possibly in combination with model transformations (see [1,5,10]). The restriction to proper schedulers, however, is often too restrictive. First, there are models that have no proper scheduler. Second, even if proper schedulers exist, the expectation of the accumulated weight of schedulers missing the goal with a positive probability should be taken into account as well. Important such applications include the semantics of probabilistic programs (see e.g. [4,7,12,14,16]) where no guarantee for almost sure termination can be given and the analysis of program properties at termination time gives rise to stochastic shortest (longest) path problems in which the goal (halting configuration) is not reached almost surely. Other examples are the fault-tolerance analysis (e.g., expected costs of repair mechanisms) in selected error scenarios that can appear with some positive, but small probability or the trade-off analysis with conjunctions of utility and cost constraints that are achievable with positive probability, but not almost surely (see e.g. [2]).

This motivates the switch to variants of classical SSP problems where the restriction to proper schedulers is relaxed. One option (e.g., considered in [8]) is to seek a scheduler optimizing the expectation of the random variable that assigns weight 0 to all paths not reaching the goal and the accumulated weight of the shortest prefix reaching the goal to all other paths. We refer to this expectation as *partial expectation*. Second, we consider the *conditional expectation* of the accumulated weight until reaching the goal under the condition that the goal is reached. In general, partial expectations describe situations in which some reward (positive and negative) is accumulated but only retrieved if a certain goal is met. In particular, partial expectations can be an appropriate replacement for the classical expected weight before reaching the goal if we want to include schedulers which miss the goal with some (possibly very small) probability. In contrast to conditional expectations, the resulting scheduler still has an incentive to reach the goal with a high probability, while schedulers maximizing the conditional expectation might reach the goal with a very small positive probability.

Previous work on partial or conditional expected accumulated weights was restricted to the case of non-negative weights. More precisely, partial expectations have been studied in the setting of stochastic multiplayer games with non-negative weights [8]. Conditional expectations in MDPs with non-negative weights have been addressed in [3]. In both cases, optimal values are achieved by weight-based deterministic schedulers that depend on the current state and the weight that has been accumulated so far, while memoryless schedulers are not sufficient. Both [8] and [3] prove the existence of a *saturation point* for the accumulated weight from which on optimal schedulers behave memoryless and maximize the probability to reach a goal state. This yields exponential-time algorithms for computing optimal schedulers using an iterative linear programming approach. Moreover, [3] proves that the threshold problem for conditional expectations ("does there exist a scheduler S such that the conditional expectation under S exceeds a given threshold?") is PSPACE-hard even for acyclic MDPs.

The purpose of the paper is to study partial and conditional expected accumulated weights for MDPs with integer weights. The switch from non-negative to integer weights indeed causes several additional difficulties. We start with the following observation. While optimal partial or conditional expectations in non-negative MDPs are rational, they can be irrational in the general setting:

**Fig. 1.** Enabled actions are denoted by Greek letters and the weight associated to the action is stated after the bar. Probabilistic choices are marked by a bold arc and transition probabilities are denoted next to the arrows.

**Example 1.** Consider the MDP M depicted on the left in Fig. 1. In the initial state s*init*, two actions are enabled. Action τ leads to *goal* with probability 1 and weight 0. Action σ leads to the states s and t with probability 1/2 from where we will return to s*init* with weight −2 or +1, respectively. The scheduler choosing τ immediately leads to an expected weight of 0 and is optimal among schedulers reaching the goal almost surely. As long as we choose σ in s*init*, the accumulated weight follows an asymmetric random walk increasing by 1 or decreasing by 2 with probability 1/2 before we return to s*init*. It is well known that the probability to ever reach accumulated weight +1 in this random walk is 1/Φ where <sup>Φ</sup> <sup>=</sup> 1+√<sup>5</sup> <sup>2</sup> is the golden ratio. Likewise, ever reaching accumulated weight n has probability 1/Φ<sup>n</sup> for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>. Consider the scheduler <sup>S</sup><sup>k</sup> choosing <sup>τ</sup> as soon as the accumulated weight reaches k in s*init*. Its partial expectation is k/Φ<sup>k</sup> as the paths which never reach weight k are assigned weight 0. The maximum is reached at k = 2. In Sect. 4, we prove that there are optimal schedulers whose decisions only depend on the current state and the weight accumulated so far. With this result we can conclude that the maximal partial expectation is indeed 2/Φ<sup>2</sup>, an irrational number.

The conditional expectation of S<sup>k</sup> in M is k as S<sup>k</sup> reaches the goal with accumulated weight k if it reaches the goal. So, the conditional expectation is not bounded. If we add a new initial state making sure that the goal is reached with positive probability as in the MDP N , we can obtain an irrational maximal conditional expectation as well: The scheduler T<sup>k</sup> choosing τ in c as soon as the weight reaches k has conditional expectation k/2Φ<sup>k</sup> <sup>1</sup>/2+1/2Φ<sup>k</sup> . The maximum is obtained for k = 3; the maximal conditional expectation is <sup>3</sup>/Φ<sup>3</sup> 1+1/Φ<sup>3</sup> <sup>=</sup> <sup>3</sup> 3+√<sup>5</sup> .

Moreover, while the proposed algorithms of [3,8] crucially rely on the monotonicity of the accumulated weights along the prefixes of paths, the accumulated weights of prefixes of path can oscillate when there are positive and negative weights. As we will see later, this implies that the existence of saturation points is no longer ensured and optimal schedulers might require infinite memory (more precisely, a counter for the accumulated weight). These observations provide evidence why linear-programming techniques as used in the case of non-negative MDPs [3,8] cannot be expected to be applicable for the general setting.

**Contributions.** We study the problem of maximizing the partial and conditional expected accumulated weight in MDPs with integer weights. Our first result is that the finiteness of the supremum of partial and conditional expectations in MDPs with integer weights can be checked in polynomial time (Sect. 3). For both variants we show that there are optimal weight-based deterministic schedulers if the supremum is finite (Sect. 4). Although the suprema might be irrational and optimal schedulers might need infinite memory, the suprema can be -approximated in time exponential in the size of the MDP and polynomial in log(1/) (Sect. 5). By duality of maximal and minimal expectations, analogous results hold for the problem of minimizing the partial or conditional expected accumulated weight. (Note that we can multiply all weights by −1 and then apply the results for maximal partial resp. conditional expectations.)

**Related Work.** Closest to our contribution is the above mentioned work on partial expected accumulated weights in stochastic multiplayer games with nonnegative weights in [8] and on computation schemes for maximal conditional expected accumulated weights in non-negative MDPs [3]. Conditional expected termination time in probabilistic push-down automata has been studied in [11], which can be seen as analogous considerations for a class of infinite-state Markov chains with non-negative weights. The recent work on notions of conditional value at risk in MDPs [15] also studies conditional expectations, but the considered random variables are limit averages and a notion of (non-accumulated) weight-bounded reachability.

# **2 Preliminaries**

We give basic definitions and present our notation. More details can be found in textbooks, e.g. [18].

**Notations for Markov Decision Processes.** A *Markov decision process* (MDP) is a tuple M = (S, *Act*, P, s*init*,*wgt*) where S is a finite set of states, *Act* a finite set of actions, <sup>s</sup>*init* <sup>∈</sup> <sup>S</sup> the initial state, <sup>P</sup> : <sup>S</sup> <sup>×</sup> *Act* <sup>×</sup> <sup>S</sup> <sup>→</sup> [0, 1] <sup>∩</sup> <sup>Q</sup> is the transition probability function and *wgt* : <sup>S</sup> <sup>×</sup>*Act* <sup>→</sup> <sup>Z</sup> the weight function. We require that - <sup>t</sup>∈<sup>S</sup> <sup>P</sup>(s, α, t) ∈ {0, <sup>1</sup>} for all (s, α) <sup>∈</sup> <sup>S</sup> <sup>×</sup>*Act*. We write *Act*(s) for the set of actions that are enabled in s, i.e., α ∈ *Act*(s) iff - <sup>t</sup>∈<sup>S</sup> <sup>P</sup>(s, α, t) = 1. We assume that *Act*(s) is non-empty for all s and that all states are reachable from s*init*. We call a state absorbing if the only enabled action leads to the state itself with probability 1 and weight 0. The paths of M are finite or infinite sequences s<sup>0</sup> α<sup>0</sup> s<sup>1</sup> α<sup>1</sup> s<sup>2</sup> α<sup>2</sup> ... where states and actions alternate such that P(si, αi, s<sup>i</sup>+1) > 0 for all i ≥ 0. If π = s<sup>0</sup> α<sup>0</sup> s<sup>1</sup> α<sup>1</sup> ...α<sup>k</sup>−<sup>1</sup> s<sup>k</sup> is finite, then *wgt*(π) = *wgt*(s0, α0) + ... + *wgt*(sk−<sup>1</sup>, αk−<sup>1</sup>) denotes the accumulated weight of π, P(π) = P(s0, α0, s1) · ... · P(sk−<sup>1</sup>, αk−<sup>1</sup>, sk) its probability, and *last*(π) = s<sup>k</sup> its last state. The *size* of M, denoted *size*(M), is the sum of the number of states plus the total sum of the logarithmic lengths of the non-zero probability values P(s, α, s ) as fractions of co-prime integers and the weight values *wgt*(s, α).

**Scheduler.** A *(history-dependent, randomized) scheduler* for M is a function S that assigns to each finite path π a probability distribution over *Act*(*last*(π)). S is called memoryless if S(π) = S(π ) for all finite paths π, π with *last*(π) = *last*(π ), in which case S can be viewed as a function that assigns to each state s a distribution over *Act*(s). S is called deterministic if S(π) is a Dirac distribution for each path π, in which case S can be viewed as a function that assigns an action to each finite path π. Scheduler S is said to be *weightbased* if S(π) = S(π ) for all finite paths π, π with *wgt*(π) = *wgt*(π ) and *last*(π) = *last*(π ). Thus, deterministic weight-based schedulers can be viewed as functions that assign actions to state-weight-pairs. By *HR*<sup>M</sup> we denote the class of all schedulers, by *WR*<sup>M</sup> the class of weight-based schedulers, by *WD*<sup>M</sup> the class of weight-based, deterministic schedulers, and by *MD*<sup>M</sup> the class of memoryless deterministic schedulers. Given a scheduler S, ς = s<sup>0</sup> α<sup>0</sup> s<sup>1</sup> α<sup>1</sup> ... is a S-path iff ς is a path and S(s<sup>0</sup> α<sup>0</sup> s<sup>1</sup> α<sup>1</sup> ...α<sup>k</sup>−<sup>1</sup> sk)(αk) > 0 for all k ≥ 0.

**Probability Measure.** We write Pr<sup>S</sup> <sup>M</sup>,s or briefly Pr<sup>S</sup> <sup>s</sup> to denote the probability measure induced by S and s. For details, see [18]. We will use LTLlike formulas to denote measurable sets of paths and also write ♦(wgt x) to describe the set of infinite paths having a prefix π with wgt(π) <sup>x</sup> for <sup>x</sup> <sup>∈</sup> <sup>Z</sup> and ∈ {<, ≤, =, ≥, >}. Given a measurable set ψ of infinite paths, we define Prmin <sup>M</sup>,s(ψ) = inf<sup>S</sup> Pr<sup>S</sup> M,s(ψ) and Prmax <sup>M</sup>,s(ψ) = sup<sup>S</sup> Pr<sup>S</sup> <sup>M</sup>,s(ψ) where S ranges over all schedulers for M. Throughout the paper, we suppose that the given MDP has a designated state *goal*. Then, pmax <sup>s</sup> and pmin <sup>s</sup> denote the maximal resp. minimal probability of reaching *goal* from s. That is, pmax <sup>s</sup> = sup<sup>S</sup> Pr<sup>S</sup> <sup>s</sup> (♦*goal*) and pmin <sup>s</sup> = inf<sup>S</sup> Pr<sup>S</sup> <sup>s</sup> (♦*goal*). Let Actmax(s) = {<sup>α</sup> <sup>∈</sup> Act(s)<sup>|</sup> - <sup>t</sup>∈<sup>S</sup> <sup>P</sup>(s, α, t) · <sup>p</sup>max <sup>t</sup> = pmax <sup>s</sup> }, and Actmin(s) = {<sup>α</sup> <sup>∈</sup> Act(s)<sup>|</sup> - <sup>t</sup>∈<sup>S</sup> <sup>P</sup>(s, α, t) · <sup>p</sup>min <sup>t</sup> = pmin <sup>s</sup> }.

**Mean Payoff.** A well-known measure for the long-run behavior of a scheduler S in an MDP M is the *mean payoff*. Intuitively, the mean payoff is the amount of weight accumulated per step on average in the long run. Formally, we define the mean payoff as the following random variable on infinite paths ζ = s0α0s1α<sup>1</sup> ... : *MP*(ζ) := lim inf <sup>k</sup>→∞ k <sup>i</sup>=0 wgt(si,αi) <sup>k</sup>+1 . The mean payoff of the scheduler S starting in s*init* is then defined as the expected value E<sup>S</sup> <sup>s</sup>*init* (*MP*). The maximal mean payoff is the supremum over all schedulers which is equal to the maximum over all MD-schedulers: Emax <sup>s</sup>*init* (*MP*) = max<sup>S</sup>∈*MD* <sup>E</sup><sup>S</sup> <sup>s</sup>*init* (*MP*). In strongly connected MDPs, the maximal mean payoff does not depend on the initial state.

**End Components, MEC-Quotient.** An *end component* of M is a strongly connected sub-MDP. End components can be formalized as pairs E = (E, A) where E is a nonempty subset of S and A a function that assigns to each state s ∈ E a nonempty subset of *Act*(s) such that the graph induced by E is strongly connected. E is called *maximal* if there is no end component E = (E , A ) with E = E , E ⊆ E and A(s) ⊆ A (s) for all s ∈ E. The *MEC-quotient* of an MDP M is the MDP *MEC* (M) arising from M by collapsing all states that belong to the same maximal end component E to a state s<sup>E</sup> . All actions enabled in some state in E not belonging to E are enabled in s<sup>E</sup> . Details and the formal construction can be found in [9]. We call an end component E *positively weight-divergent* if there is a scheduler <sup>S</sup> for <sup>E</sup> such that Pr<sup>S</sup> <sup>E</sup>,s(♦(*wgt* ≥ n)) = 1 for all s ∈ E and <sup>n</sup> <sup>∈</sup> <sup>N</sup>. In [1], it is shown that the existence of positively weight-divergent end components can be decided in polynomial time.

### **3 Partial and Conditional Expectations in MDPs**

We define *partial* and *conditional expectations* in MDPs. We extend the definition of [8] by introducing partial expectations with *bias* which are closely related to conditional expectations. Afterwards, we sketch the computation of maximal partial expectations in MDPs with non-negative weights and in Markov chains.

**Partial and Conditional Expectation.** In the sequel, let M be an MDP with a designated absorbing goal state *goal*. Furthermore, we collapse all states from which *goal* is not reachable to one absorbing state *fail*. Let <sup>b</sup> <sup>∈</sup> <sup>R</sup>. We define the random variable <sup>⊕</sup><sup>b</sup>*goal* on infinite paths <sup>ζ</sup> by

$$\oplus^b goal(\zeta) = \begin{cases} wgt(\zeta) + b & \text{if } \zeta \models \Diamond gool, \\ 0 & \text{if } \zeta \not\models \Diamond gool. \end{cases}$$

We call the expectation of this random variable under a scheduler S the *partial expectation with bias* b of S and write *PE* <sup>S</sup> <sup>M</sup>,s*init* [b] := <sup>E</sup><sup>S</sup> <sup>M</sup>,s*init* (⊕<sup>b</sup>*goal*) as well as *PE*sup <sup>M</sup>,s*init* [b] := sup<sup>S</sup>∈*HR*<sup>M</sup> *PE* <sup>S</sup> <sup>M</sup>,s*init* [b]. If b = 0, we sometimes drop the argument b; if M is clear from the context, we drop the subscript. In order to maximize the partial expectation, intuitively one has to find the right balance between reaching *goal* with high probability and accumulating a high positive amount of weight before reaching *goal*. The bias can be used to shift this balance by additionally rewarding or penalizing a high probability to reach *goal*.

The *conditional expectation* of <sup>S</sup> is defined as the expectation of <sup>⊕</sup><sup>0</sup>*goal* under the condition that *goal* is reached. It is defined if Pr<sup>S</sup> <sup>M</sup>,s*init* (♦*goal*) > 0. We write *CE* <sup>S</sup> <sup>M</sup>,s*init* := <sup>E</sup><sup>S</sup> <sup>M</sup>,s*init* (⊕<sup>0</sup>*goal*|♦*goal*) and *CE*sup <sup>M</sup>,s*init* = sup<sup>S</sup> *CE* <sup>S</sup> <sup>M</sup>,s*init* where the supremum is taken over all schedulers S with Pr<sup>S</sup> <sup>M</sup>,s*init* (♦*goal*) > 0. We can express the conditional expectation as *CE* <sup>S</sup> <sup>M</sup>,s*init* <sup>=</sup> *PE* <sup>S</sup> <sup>M</sup>,s*init* /Pr<sup>S</sup> <sup>M</sup>,s*init* (♦*goal*). The following proposition establishes a close connection between conditional expectations and partial expectations with bias.

**Proposition 2.** *Let* <sup>M</sup> *be an MDP,* <sup>S</sup> *a scheduler with* Pr<sup>S</sup> <sup>s</sup>*init* (♦*goal*) > 0*,* <sup>θ</sup> <sup>∈</sup> <sup>Q</sup>*, and* ∈ {<, <sup>≤</sup>, <sup>≥</sup>, >}*. Then we have PE* <sup>S</sup> <sup>s</sup>*init* [−θ] 0 *iff CE* <sup>S</sup> <sup>s</sup>*init* θ*. Further, if* Prmin <sup>s</sup>*init* (♦*goal*) <sup>&</sup>gt; <sup>0</sup>*, then PE*sup <sup>s</sup>*init* [−θ] 0 *iff CE*sup <sup>s</sup>*init* θ*.*

*Proof.* The first claim follows from *PE* <sup>S</sup> <sup>s</sup>*init* [−θ] = *PE* <sup>S</sup> <sup>s</sup>*init* [0] <sup>−</sup> Pr<sup>S</sup> <sup>s</sup>*init* (♦*goal*) · θ. The second claim follows by quantification over all schedulers.

In [3], it is shown that deciding whether *CE*sup <sup>s</sup>*init* θ for ∈ {<, ≤, ≥, >} and <sup>θ</sup> <sup>∈</sup> <sup>Q</sup> is PSPACE-hard even for acyclic MDPs. We conclude:

**Corollary 3.** *Given an MDP* M*,* ∈ {<, <sup>≤</sup>, <sup>≥</sup>, >}*, and* <sup>θ</sup> <sup>∈</sup> <sup>Q</sup>*, deciding whether PE*sup <sup>M</sup>,s*init* θ *is PSPACE-hard.*

**Finiteness.** We present criteria for the finiteness of *PE*sup <sup>s</sup>*init* [b] and *CE*sup <sup>s</sup>*init* . Detailed proofs can be found in Appendix A.1 of [17]. By slightly modifying the construction from [1] which removes end components only containing 0-weight cycles, we obtain the following result.

**Proposition 4.** *Let* M *be an MDP which does not contain positively weightdivergent end components and let* <sup>b</sup> <sup>∈</sup> <sup>Q</sup>*. Then there is a polynomial time transformation to an MDP* N *containing all states from* M *and possibly an additional absorbing state fail such that*


Hence, we can restrict ourselves to MDPs in which all end components have negative maximal expected mean payoff if there are no positively weight divergent end components. The following result is now analogous to the result in [1] for the classical SSP problem.

**Proposition 5.** *Let* <sup>M</sup> *be an MDP and* <sup>b</sup> <sup>∈</sup> <sup>R</sup> *arbitrary. The optimal partial expectation PE*sup <sup>s</sup>*init* [b] *is finite if and only if there are no positively weightdivergent end components in* M*.*

To obtain an analogous result for conditional expectations, we observe that the finiteness of the maximal partial expectation is necessary for the finiteness of the maximal conditional expectation. However, this is not sufficient. In [3], a *critical scheduler* is defined as a scheduler S for which there is a path containing a positive cycle and for which Pr<sup>S</sup> <sup>s</sup>*init* (♦*goal*) = 0. Given a critical scheduler, it is easy to construct a sequence of schedulers with unbounded conditional expectation (see Appendix A.1 of [17] and [3]). On the other hand, if Prmin <sup>M</sup>,s*init* (♦*goal*) <sup>&</sup>gt; 0, then *CE*sup <sup>s</sup>*init* is finite if and only if *PE*sup <sup>s</sup>*init* is finite. We will show how we can restrict ourselves to this case if there are no critical schedulers:

So, let <sup>M</sup> be an MDP with Prmin <sup>M</sup>,s*init* (♦*goal*) = 0 and suppose there are no critical schedulers for M. Let S<sup>0</sup> be the set of all states reachable from s*init* while only choosing actions in Actmin. As there are no critical schedulers, (S0, Actmin) does not contain positive cycles. So, there is a finite maximal weight w<sup>s</sup> among paths leading from s*init* to s in S0. Consider the following MDP N : It contains the MDP M and a new initial state t*init*. For each s ∈ S<sup>0</sup> and each <sup>α</sup> <sup>∈</sup> Act(s) \ Actmin(s), <sup>N</sup> also contains a new state <sup>t</sup>s,α which is reachable from t*init* via an action βs,α with weight w<sup>s</sup> and probability 1. In ts,α, only action α with the same probability distribution over successors and the same weight as in s is enabled. So in N , one has to decide immediately in which state to leave S<sup>0</sup> and one accumulates the maximal weight which can be accumulated in M to reach this state in S0. In this way, we ensure that Prmin <sup>N</sup>,t*init* (♦*goal*) > 0.

**Proposition 6.** *The constructed MDP* <sup>N</sup> *satisfies CE*sup <sup>N</sup>,t*init* <sup>=</sup> *CE*sup <sup>M</sup>,s*init .*

We can rely on this reduction to an MDP in which *goal* is reached with positive probability for -approximations and the exact computation of the optimal conditional expectation. In particular, the values w<sup>s</sup> for s ∈ S<sup>0</sup> are easy to compute by classical shortest path algorithms on weighted graphs. Furthermore, we can now decide the finiteness of the maximal conditional expectation.

**Proposition 7.** *For an arbitrary MDP* <sup>M</sup>*, CE*sup <sup>M</sup>,s*init is finite if and only if there are no positively weight-divergent end components and no critical schedulers.*

**Partial and Conditional Expectations in Markov Chains.** Markov chains with integer weights can be seen as MDPs with only one action α enabled in every state. Consequently, there is only one scheduler for a Markov chain. Hence, we drop the superscripts in pmax and *PE*sup.

**Proposition 8.** *The partial and conditional expectation in a Markov chain* C *are computable in polynomial time.*

*Proof.* Let α be the only action available in C. Assume that all states from which *goal* is not reachable have been collapsed to an absorbing state *fail*. Then *PE*C,s*init* is the value of x<sup>s</sup>*init* in the unique solution to the following system of linear equations with one variable x<sup>s</sup> for each state s:

$$\begin{aligned} x\_{goal} &= x\_{fail} = 0, \\ x\_s &= wgt(s, \alpha) \cdot p\_s + \sum\_t P(s, \alpha, t) \cdot x\_t \text{ for } s \in S \text{ (}goal, fail). \end{aligned}$$

The existence of a unique solution follows from the fact that {*goal*} and {*fail*} are the only end components (see [18]). It is straight-forward to check that (*PE*C,s)<sup>s</sup>∈<sup>S</sup> is this unique solution. The conditional expectation is obtained from the partial expectation by dividing by the probability p<sup>s</sup>*init* to reach the goal.

This result can be seen as a special case of the following result. Restricting ourselves to schedulers which reach the goal with maximal or minimal probability in an MDP without positively weight-divergent end components, linear programming allows us to compute the following two memoryless deterministic schedulers (see [3,8]).

**Proposition 9.** *Let* M *be an MDP without positively weight-divergent end components. There is a scheduler* Max ∈ *MD*<sup>M</sup> *such that for each* s ∈ S *we have* PrMax <sup>s</sup> (♦*goal*) = pmax <sup>s</sup> *and PE*Max <sup>s</sup> = supS*PE* <sup>S</sup> <sup>s</sup> *where the supremum is taken over all schedulers* S *with* Pr<sup>S</sup> <sup>s</sup> (♦*goal*) = pmax <sup>s</sup> *. Similarly, there is a scheduler* Min ∈ *MD*<sup>M</sup> *maximizing the partial expectation among all schedulers reaching the goal with minimal probability. Both these schedulers and their partial expectations are computable in polynomial time.*

These schedulers will play a crucial role for the approximation of the maximal partial expectation and the exact computation of maximal partial expectations in MDPs with non-negative weights.

**Partial Expectations in MDPs with Non-negative Weights.** In [8], the computation of maximal partial expectations in stochastic multiplayer games with non-negative weights is presented. We adapt this approach to MDPs with non-negative weights. A key result is the existence of a *saturation point*, a bound on the accumulated weight above which optimal schedulers do not need memory.

In the sequel, let <sup>R</sup> <sup>∈</sup> <sup>Q</sup> be arbitrary, let <sup>M</sup> be an MDP with non-negative weights, *PE*sup <sup>s</sup>*init* < ∞, and assume that end components have negative maximal mean payoff (see Proposition 4). A saturation point for bias R is a natural number p such that there is a scheduler S with *PE* <sup>S</sup> <sup>s</sup>*init* [R] = *PE*sup <sup>s</sup>*init* [R] which is memoryless and deterministic as soon as the accumulated weight reaches p. I.e. for any two paths π and π , with last(π) = last(π ) and wgt(π), wgt(π ) > p, S(π) = S(π ).

Transferring the idea behind the saturation point for conditional expectations given in [3], we provide the following saturation point which can be considerably smaller than the saturation point given in [8] in stochastic multiplayer games. Detailed proofs to this section are given in Appendix A.2 of [17].

**Proposition 10.** *We define* pmax s,α := - <sup>t</sup>∈<sup>S</sup> <sup>P</sup>(s, α, t)·pmax <sup>t</sup> *and PE*Max s,α := pmax s,α · wgt(s, α) + - <sup>t</sup>∈<sup>S</sup> <sup>P</sup>(s, α, t) · *PE*Max <sup>t</sup> *. Then,*

$$\mathfrak{p}\_R := \sup \left\{ \left. \frac{P E\_{s,\alpha}^{\mathfrak{M}\mathfrak{a}\mathfrak{g}} - P E\_s^{\mathfrak{M}\mathfrak{a}\mathfrak{g}}}{p\_s^{\max} - p\_{s,\alpha}^{\max}} \right| s \in S, \alpha \in \operatorname{Act}(s) \; \backslash \operatorname{Act}^{\max}(s) \right\} - R$$

*is an upper saturation point for bias* R *in* M*.*

The saturation point p<sup>R</sup> is chosen such that, as soon as the accumulated weight exceeds pR, the scheduler Max is better than any scheduler deviating from Max for only one step. So, the proposition states that Max is then also better than any other scheduler.

As all values involved in the computation can be determined by linear programming, the saturation point p<sup>R</sup> is computable in polynomial time. This also means that the logarithmic length of p<sup>R</sup> is polynomial in the size of M and hence p<sup>R</sup> itself is at most exponential in the size of M.

**Proposition 11.** *Let* <sup>R</sup> <sup>∈</sup> <sup>Q</sup> *and let* <sup>B</sup><sup>R</sup> *be the least integer greater or equal to* p<sup>R</sup> + max<sup>s</sup>∈S,α∈Act(s) wgt(s, α) *and let* S := S \ {*goal*, f ail}*. The values* (*PE*sup <sup>s</sup>*init* [r+R])s∈S-,0≤r≤B<sup>R</sup> *form the unique solution to the following linear program in the variables* (xs,r)s∈S-,0≤r≤B<sup>R</sup> *(r ranges over integers):* 

*Minimize* s∈S-,0≤r≤B<sup>R</sup> <sup>x</sup>s,r *under the following constraints:*

$$\begin{aligned} &For \; r \ge \mathfrak{p}\_R: x\_{s,r} = p\_s^{\max} \cdot (r+R) + E\_s^{\mathfrak{M}ap}, \\ &for \; r < \mathfrak{p}\_R \; and \; \alpha \in Act(s): \\ &x\_{s,r} \ge P(s, \alpha, goal) \cdot (r+R+wgt(s, \alpha)) + \sum\_{t \in S'} P(s, \alpha, t) \cdot x\_{t, r+wgt(s, \alpha)}. \end{aligned}$$

From a solution x to the linear program, we can easily extract an optimal weight-based deterministic scheduler. This scheduler only needs finite memory because the accumulated weight increases monotonically along paths and as soon as the saturation point is reached Max provides the optimal decisions. As B<sup>R</sup> is exponential in the size of M, the computation of the optimal partial expectation via this linear program runs in time exponential in the size of M.

#### **4 Existence of Optimal Schedulers**

We prove that there are optimal weight-based deterministic schedulers for partial and conditional expectations. After showing that, if finite, *PE*sup <sup>s</sup>*init* is equal to sup<sup>S</sup>∈*WD*<sup>M</sup> *PE* <sup>S</sup> <sup>s</sup>*init* , we take an analytic approach to show that there is a weightbased deterministic scheduler maximizing the partial expectation. We define a metric on *WD*<sup>M</sup> turning it into a compact space. Then, we prove that the function assigning the partial expectation to schedulers is upper semi-continuous. We conclude that there is a weight-based deterministic scheduler obtaining the maximum. Proofs to this section can be found in Appendix B of [17].

**Proposition 12.** *Let* <sup>M</sup> *be an MDP with PE*sup <sup>s</sup>*init* <sup>&</sup>lt; <sup>∞</sup>*. Then we have PE*sup <sup>s</sup>*init* = sup<sup>S</sup>∈*WD*<sup>M</sup> *PE* <sup>S</sup> <sup>s</sup>*init .*

*Proof sketch.* We can assume that all end components have negative maximal expected mean payoff (see Proposition 4). Given a scheduler S ∈ *HR*M, we take the expected number of times θs,w that s is visited with accumulated weight w under S for each state-weight pair (s, w), and the expected number of times θs,w,α that S then chooses α. These values are finite due to the negative maximal mean payoff in end components. We define the scheduler T ∈ *WR*<sup>M</sup> choosing α in s with probability θs,w,α/θs,w when weight w has been accumulated. Then, we show by standard arguments that we can replace all probability distributions that T chooses by Dirac distributions to obtain a scheduler T ∈ *WD*<sup>M</sup> such that *PE* <sup>T</sup>- <sup>s</sup>*init* <sup>≥</sup> *PE* <sup>S</sup> <sup>s</sup>*init* .

It remains to show that the supremum is obtained by a weight-based deterministic scheduler. Given an MDP M with arbitrary integer weights, we define the following metric d<sup>M</sup> on the set of weight-based deterministic schedulers, i.e. on the set of functions from <sup>S</sup> <sup>×</sup> <sup>Z</sup> <sup>→</sup> Act: For two such schedulers <sup>S</sup> and T, we let dM(S,T) := 2−<sup>R</sup> where R is the greatest natural number such that S S × {−(R−1),...,R−1} = T S × {−(R−1),...,R−1} or ∞ if there is no greatest such natural number.

**Lemma 13.** *The metric space* (ActS×Z, dM) *is compact.*

Having defined this compact space of schedulers, we can rely on the analytic notion of upper semi-continuity.

**Lemma 14** (Upper Semi-Continuity of Partial Expectations)**.** *If PE*sup <sup>s</sup>*init is finite in* <sup>M</sup>*, then the function PE* : (*WD*, d*WD* ) <sup>→</sup> (R∞, deuclid) *assigning PE* <sup>S</sup> <sup>s</sup>*init to a weight-based deterministic scheduler* S *is upper semi-continuous.*

The technical proof of this lemma can be found in Appendix B of [17]. We arrive at the main result of this section.

**Theorem 15** (Existence of Optimal Schedulers for Partial Expectations)**.** *If PE*sup <sup>s</sup>*init is finite in an MDP* M*, then there is a weight-based deterministic scheduler* S *with PE*sup <sup>s</sup>*init* = *PE* <sup>S</sup> <sup>s</sup>*init .*

*Proof.* If *PE*sup <sup>s</sup>*init* is finite, then the map *PE* : (*WD*, d*WD* ) <sup>→</sup> (R∞, deuclid) is upper semi-continuous. So, this map has a maximum because (*WD*, d*WD* ) is a compact metric space.

**Corollary 16** (Existence of Optimal Schedulers for Conditional Expectations)**.** *If CE*sup <sup>s</sup>*init is finite in an MDP* M*, then there is a weight-based deterministic scheduler* S *with CE*sup <sup>s</sup>*init* = *CE* <sup>S</sup> <sup>s</sup>*init .*

*Proof.* By Proposition 6, we can assume that Prmin <sup>s</sup>*init* (♦*goal*) > 0. We know that *PE*sup <sup>s</sup>*init* [−*CE*sup <sup>s</sup>*init* ] = 0 and that there is a weight-based deterministic scheduler S with *PE* <sup>S</sup> <sup>s</sup>*init* [−*CE*sup <sup>s</sup>*init* ] = 0. By Proposition 2, S maximizes the conditional expectation as it reaches *goal* with positive probability.

**Fig. 2.** All non-trivial transition probabilities are 1/2. In the MDP *<sup>M</sup>*, the optimal choice to maximize the partial expectation in t depends on the parity of the accumulated weight.

In MDPs with non-negative weights, the optimal decision in a state s only depends on s as soon as the accumulated weight exceeds a saturation point. In MDPs with arbitrary integer weights, it is possible that the optimal choice of action does not become stable for increasing values of accumulated weight as we see in the following example.

**Example 17.** Let us first consider the MDP N depicted in Fig. 2. Let π be a path reaching t for the first time with accumulated weight r. Consider a scheduler which chooses β for the first k times and then α. In this situation, the partial expectation from this point on is:

$$\frac{1}{2^{k+1}}(r-k) + \sum\_{i=1}^{k} \frac{1}{2^i}(r-i) = \frac{1}{2^{k+1}} + \sum\_{i=1}^{k+1} \frac{1}{2^i}(r-i) = \frac{k-r+4}{2^{k+1}} + r - 2.$$

For r ≥ 2, this partial expectation has its unique maximum for the choice k = r−2. This already shows that an optimal scheduler needs infinite memory. No matter how much weight r has been accumulated when reaching t, the optimal scheduler has to count the r−2 times it chooses β.

Furthermore, we can transfer the optimal scheduler for the MDP N to the MDP M. In state t, we have to make a nondeterministic choice between two action leading to the states q<sup>0</sup> and q1, respectively. In both of these states, action β is enabled which behaves like the same action in the MDP N except that it moves between the two states if *goal* is not reached. So, the action α is only enabled every other step. As in N , we want to choose α after choosing β r−2 times if we arrived in t with accumulated weight r ≥ 2. So, the choice in t depends on the parity of r: For r = 1 or r even, we choose δ. For odd r ≥ 3, we choose γ. This shows that the optimal scheduler in the MDP M needs specific information about the accumulated weight, in this case the parity, no matter how much weight has been accumulated.

In the example, the optimal scheduler has a periodic behavior when fixing a state and looking at optimal decisions for increasing values of accumulated weight. The question whether an optimal scheduler always has such a periodic behavior remains open.

### **5 Approximation**

As the optimal values for partial and conditional expectation can be irrational, there is no hope to compute these values by linear programming as in the case of non-negative weights. In this section, we show how we can nevertheless approximate the values. The main result is the following.

**Theorem 18.** *Let* <sup>M</sup> *be an MDP with PE*sup <sup>M</sup>,s*init* <sup>&</sup>lt; <sup>∞</sup> *and* > <sup>0</sup>*. The maximal partial expectation PE*sup <sup>M</sup>,s*init can be approximated up to an absolute error of in time exponential in the size of* M *and polynomial in* log(1/)*. If further, CE*sup <sup>M</sup>,s*init* <sup>&</sup>lt; <sup>∞</sup>*, also CE*sup <sup>M</sup>,s*init can be approximated up to an absolute error of in time exponential in the size of* M *and polynomial in* log(1/)*.*

We first prove that upper bounds for *PE*sup <sup>M</sup>,s*init* and *CE*sup <sup>M</sup>,s*init* can be computed in polynomial time. Then, we show that there are -optimal schedulers for the partial expectation which become memoryless as soon as the accumulated weight leaves a sufficiently large weight window around 0. We compute the optimal partial expectation of such a scheduler by linear programming. The result can then be extended to conditional expectations.

**Upper Bounds.** Let M be an MDP in which all end components have negative maximal mean payoff. Let δ be the minimal non-zero transition probability in M and W := max<sup>s</sup>∈S,α∈Act(s) |*wgt*(s, α)|. Moving through the MEC-quotient, the probability to reach an accumulated weight of <sup>|</sup>S| · <sup>W</sup> is bounded by 1 <sup>−</sup> <sup>δ</sup>|S<sup>|</sup> as *goal* or *fail* is reached within <sup>S</sup> steps with probability at least 1 <sup>−</sup> <sup>δ</sup>|S<sup>|</sup> . It remains to show similar bounds inside an end component.

We will use the characterization of the maximal mean payoff in terms of super-harmonic vectors due to Hordijk and Kallenberg [13] to define a supermartingale controlling the growth of the accumulated weight in an end component under any scheduler. As the value vector for the maximal mean payoff in an end component is constant and negative in our case, the results of [13] yield:

**Proposition 19** (Hordijk, Kallenberg)**.** *Let* E = (S, Act) *be an end component with maximal mean payoff* −t *for some* t > 0*. Then there is a vector* (us)<sup>s</sup>∈<sup>S</sup> *such that* −t + u<sup>s</sup> ≥ wgt(s, α) + - s-<sup>∈</sup><sup>S</sup> <sup>P</sup>(s, α, s ) · u<sup>s</sup>-*.*

*Furthermore, let* <sup>v</sup> *be the vector* (−t, . . . , <sup>−</sup>t) *in* <sup>R</sup><sup>S</sup>*. Then,* (v, u) *is the solution to a linear program with* 2|S| *variables,* 2|S||Act| *inequalities, and coefficients formed from the transition probabilities and weights in* E*.*

We will call the vector u a *super-potential* because the expected accumulated weight after i steps is at most u<sup>s</sup> − min<sup>t</sup>∈<sup>S</sup> u<sup>t</sup> − i ·t when starting in state s. Let S be a scheduler for E starting in some state s. We define the following random variables on S-runs in E: let s(i) ∈ S be the state after i steps, let α(i) be the action chosen after i steps, let w(i) be the accumulated weight after i steps, and let π(i) be the history, i.e. the finite path after i steps.

**Lemma 20.** *The sequence* <sup>m</sup>(i) := <sup>w</sup>(i) + <sup>u</sup><sup>s</sup>(i) *satisfies* <sup>E</sup>(m(<sup>i</sup> + 1)|π(0),..., π(i)) ≤ m(i) − t *for all* i*.* 1

*Proof.* By Proposition 19, <sup>E</sup>(m(i+ 1)|π(0),...,π(i))−m(i) = wgt(s(i), <sup>S</sup>(π(i))) + - s-<sup>∈</sup><sup>S</sup> <sup>P</sup>(s(i), <sup>S</sup>(π(i)), s ) · u<sup>s</sup>-− u<sup>s</sup>(i) ≤ −t.

We are going to apply the following theorem by Blackwell [6].

**Theorem 21** (Blackwell [6])**.** *Let* X1, X2,... *be random variables, and let* S<sup>n</sup> := n <sup>k</sup>=1 Xk*. Assume that* |Xi| ≤ 1 *for all* i *and that there is a* u > 0 *such that* <sup>E</sup>(X<sup>n</sup>+1|X1,...,Xn) ≤ −u*. Then,* Pr(sup<sup>n</sup>∈<sup>N</sup> <sup>S</sup><sup>n</sup> <sup>≥</sup> <sup>t</sup>) <sup>≤</sup> <sup>1</sup>−<sup>u</sup> 1+u t *.*

<sup>1</sup> This means that <sup>m</sup>(i) + <sup>i</sup> *·* <sup>t</sup> is a super-martingale with respect to the history <sup>π</sup>(i).

We denote maxs-<sup>∈</sup><sup>S</sup> us- −mins-<sup>∈</sup><sup>S</sup> us by u. Observe that |m(i+1)−m(i)| ≤ u + W =: c<sup>E</sup> . We can rescale the sequence m(i) by defining m (i) := (m(i) − m(0))/c<sup>E</sup> . This ensures that m (0) = 0, |m (i + 1) − m (i)| ≤ 1 and <sup>E</sup>(m (i + 1)|m (0),...,m (i)) ≤ −t/c<sup>E</sup> for all i. In this way, we arrive at the following conclusion, putting <sup>λ</sup><sup>E</sup> := <sup>1</sup>−t/c<sup>E</sup> 1+t/c<sup>E</sup> .

**Corollary 22.** *For any scheduler* S *and any starting state* s *in* E*, we have* Pr<sup>S</sup> <sup>s</sup> (♦wgt <sup>≥</sup> (k+1) · <sup>c</sup><sup>E</sup> ) <sup>≤</sup> <sup>λ</sup><sup>k</sup> E *.*

$$\begin{split} & \text{Proof. By Theorem 21, } \operatorname{Pr}\_{s}^{\mathfrak{S}}(\Diamond wgt \ge (k+1) \cdot c\_{\mathcal{E}}) \le \operatorname{Pr}\_{s}^{\mathfrak{S}}(\Diamond wgt \ge \|u\| + k \cdot c\_{\mathcal{E}}) \le \operatorname{Pr}\_{s}^{\mathfrak{S}}(\operatorname{sup}\_{i \in \mathbb{N}} m'(i) \ge k) \le \left(\frac{1 - t/c\_{\mathcal{E}}}{1 + t/c\_{\mathcal{E}}}\right)^{k} . \qquad \square \end{split}$$

Let *MEC* be the set of maximal end components in M. For each E ∈ *MEC* , let <sup>λ</sup><sup>E</sup> and <sup>c</sup><sup>E</sup> be as in Corollary 22. Define <sup>λ</sup><sup>M</sup> := 1 <sup>−</sup> (δ|S<sup>|</sup> · E∈*MEC* (1 <sup>−</sup> <sup>λ</sup><sup>E</sup> )), and c<sup>M</sup> := |S| · W + - E∈*MEC* <sup>c</sup><sup>E</sup> . Then an accumulated weight of <sup>c</sup><sup>M</sup> cannot be reached with a probability greater than λ<sup>M</sup> because reaching accumulated weight c<sup>M</sup> would require reaching weight c<sup>E</sup> in some end component E or reaching weight |S|·W in the MEC-quotient and 1−λ<sup>M</sup> is a lower bound on the probability that none of this happens (under any scheduler).

**Proposition 23.** *Let* <sup>M</sup> *be an MDP with PE*sup <sup>s</sup>*init* < ∞*. There is an upper bound PE*ub *for the partial expectation in* <sup>M</sup> *computable in polynomial time.*

*Proof.* In any end component E, the maximal mean payoff −t and the superpotential u are computable in polynomial time. Hence, c<sup>E</sup> and λ<sup>E</sup> , and in turn also c<sup>M</sup> and λ<sup>M</sup> are also computable in polynomial time. When we reach accumulated weight c<sup>M</sup> for the first time, the actual accumulated weight is at most <sup>c</sup><sup>M</sup> <sup>+</sup> <sup>W</sup>. So, we conclude that Prmax <sup>s</sup> (♦wgt <sup>≥</sup> <sup>k</sup> · (c<sup>M</sup> <sup>+</sup> <sup>W</sup>)) <sup>≤</sup> <sup>λ</sup><sup>k</sup> <sup>M</sup>. The partial expectation can now be bounded by -∞ <sup>k</sup>=0(<sup>k</sup> + 1) · (c<sup>M</sup> <sup>+</sup> <sup>W</sup>) · <sup>λ</sup><sup>k</sup> <sup>M</sup> <sup>=</sup> <sup>c</sup>M+<sup>W</sup> (1−λM)<sup>2</sup> .

**Corollary 24.** *Let* <sup>M</sup> *be an MDP with CE*sup <sup>M</sup>,s*init* <sup>&</sup>lt; <sup>∞</sup>*. There is an upper bound CE*ub *for the conditional expectation in* <sup>M</sup> *computable in polynomial time.*

*Proof.* By Proposition 6, we can construct an MDP N in which *goal* is reached with probability q > 0 in polynomial time with *CE*sup <sup>M</sup>,s*init* <sup>=</sup> *CE*sup <sup>N</sup>,s*init* . Now, *CE*ub := *PE*ub/q is an upper bound for the conditional expectation in <sup>M</sup>.

**Approximating Optimal Partial Expectations.** The idea for the approximation is to assume that the partial expectation is *PE*Max <sup>s</sup>*init* <sup>+</sup> <sup>w</sup> · <sup>p</sup>max <sup>s</sup> if a high weight w has been accumulated in state s. Similarly, for small weights w , we use the value *PE*Min <sup>s</sup>*init* <sup>+</sup> <sup>w</sup> · <sup>p</sup>min <sup>s</sup> . We will first provide a lower "saturation point" making sure that only actions minimizing the probability to reach the goal are used by an optimal scheduler as soon as the accumulated weight drops below this saturation point. For the proofs to this section, see Appendix C.1 of [17].

**Proposition 25.** *Let* <sup>M</sup> *be an MDP with PE*sup <sup>s</sup>*init* < ∞*. Let* s ∈ S *and let* <sup>q</sup><sup>s</sup> := *PE* ub−*PE*Min s pmin <sup>s</sup> <sup>−</sup> min <sup>α</sup>∈Actmin(s) pmin s,α *. Then any weight-based deterministic scheduler* S *maximizing the partial expectation in* <sup>M</sup> *satisfies* <sup>S</sup>(s, w) <sup>∈</sup> Actmin(s) *if* <sup>w</sup> <sup>≤</sup> <sup>q</sup>s*.*

Let <sup>q</sup> := mins∈<sup>S</sup> <sup>q</sup><sup>s</sup> and let <sup>D</sup> := *PE*ub <sup>−</sup> min{*PE*Max <sup>s</sup> ,*PE*Min <sup>s</sup> |s ∈ S}. Given > 0, we define R<sup>+</sup> := (c<sup>M</sup> + W) · log(2D)+log(1/) log(1/λM) and R<sup>−</sup> := <sup>q</sup> <sup>−</sup> <sup>R</sup><sup>+</sup> .

**Theorem 26.** *There is a weight-based deterministic scheduler* S *such that the scheduler* T *defined by*

$$\mathfrak{T}(\pi) = \begin{cases} \mathfrak{S}(\pi) & \text{if } any \text{ } prefix \text{ } \pi' \text{ } of \text{ } \pi \text{ } satisfies \, R\_{\epsilon}^{-} \le wgt(\pi') \le R\_{\epsilon}^{+}, \\ \mathfrak{M}\mathfrak{a}\mathfrak{r}(\pi) & \text{if } the \text{ } shortest \text{ } prefix \text{ } \pi' \text{ } of \text{ } \pi \text{ } with \text{ } wgt(\pi') \notin [R\_{\epsilon}^{-}, R\_{\epsilon}^{+}], \\ & \text{ satisfies } wgt(\pi') > R\_{\epsilon}^{+}, \\ \mathfrak{M}\mathfrak{N}(\pi) & \text{otherwise}, \end{cases}$$

*satisfies PE* <sup>T</sup> <sup>s</sup>*init* <sup>≥</sup> *PE*sup <sup>s</sup>*init* − *.*

This result now allows us to compute an -approximation and an -optimal scheduler with finite memory by linear programming, similar to the case of nonnegative weights, in a linear program with R<sup>+</sup> + R<sup>−</sup> many variables and |Act| times as many inequalities.

**Corollary 27.** *PE*sup <sup>s</sup>*init can be approximated up to an absolute error of in time exponential in the size of* M *and polynomial in* log(1/)*.*

If the logarithmic length of <sup>θ</sup> <sup>∈</sup> <sup>Q</sup> is polynomial in the size of <sup>M</sup>, we can also approximate *PE*sup <sup>s</sup>*init* [θ] up to an absolute error of in time exponential in the size of M and polynomial in log(1/): We can add a new initial state s with a transition to s*init* with weight θ and approximate *PE*sup <sup>s</sup> in the new MDP.

**Transfer to Conditional Expectations.** Let <sup>M</sup> be an MDP with *CE*sup <sup>s</sup>*init* < <sup>∞</sup> and > 0. By Proposition 6, we can assume that Prmin <sup>M</sup>,s*init* (♦*goal*) =: p is positive. Clearly, *CE*sup <sup>s</sup>*init* <sup>∈</sup> [*CE*Max <sup>s</sup>*init* , *CE*ub]. We perform a binary search to approximate *CE*sup <sup>s</sup>*init* : We put A<sup>0</sup> := *CE*Max <sup>s</sup>*init* and <sup>B</sup><sup>0</sup> := *CE*ub. Given <sup>A</sup><sup>i</sup> and <sup>B</sup>i, let θ<sup>i</sup> := (A<sup>i</sup> +Bi)/2. Then, we approximate *PE*sup <sup>s</sup>*init* [−θi] up to an absolute error of p·. Let E<sup>i</sup> be the value of this approximation. If E<sup>i</sup> ∈ [−2p·, 2p·], terminate and return θ<sup>i</sup> as the approximation for *CE*sup <sup>s</sup>*init* . If E<sup>i</sup> < −2p · , put A<sup>i</sup>+1 := A<sup>i</sup> and B<sup>i</sup>+1 := θi, and repeat. If E<sup>i</sup> > 2p · , put A<sup>i</sup>+1 := θ<sup>i</sup> and B<sup>i</sup>+1 := Bi, and repeat.

**Proposition 28.** *The procedure terminates after at most* log((A0−B0)/(p·)) *iterations and returns an* 3*-approximation of CE*sup <sup>s</sup>*init in time exponential in the size of* M *and polynomial in* log(1/)*.*

The proof can be found in Appendix C.2 of [17]. This finishes the proof of Theorem 18.

# **6 Conclusion**

Compared to the setting of non-negative weights, the optimization of partial and conditional expectations faces substantial new difficulties in the setting of integer weights. The optimal values can be irrational showing that the linear programming approaches from the setting of non-negative weights cannot be applied for the computation of optimal values. We showed that this approach can nevertheless be adapted for approximation algorithms. Further, we were able to show that there are optimal weight-based deterministic schedulers. These schedulers, however, can require infinite memory and it remains open whether we can further restrict the class of schedulers necessary for the optimization. In examples, we have seen that optimal schedulers can switch periodically between actions they choose for increasing values of accumulated weight. Further insights on the behavior of optimal schedulers would be helpful to address threshold problems ("Is *PE*sup <sup>s</sup>*init* ≥ θ?").

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Equational Theories and Monads from Polynomial Cayley Representations**

Maciej Pir´og(B), Piotr Polesiuk, and Filip Sieczkowski

University of Wroclaw, Wroclaw, Poland mpirog@cs.uni.wroc.pl

**Abstract.** We generalise Cayley's theorem for monoids by providing an explicit formula for a (multi-sorted) equational theory represented by the type P X → X, where P is an arbitrary polynomial endofunctor with natural coefficients. From the computational perspective, examples of effects given by such theories include backtracking nondeterminism (obtained with the original Cayley representation X → X), finite mutable state (obtained with n → X, for a constant n), and their different combinations (via <sup>n</sup> <sup>×</sup> <sup>X</sup> <sup>→</sup> <sup>X</sup> or <sup>X</sup>*<sup>n</sup>* <sup>→</sup> <sup>X</sup>). Moreover, we show that monads induced by such theories are implementable using the type formers available in programming languages based on a polymorphic λ-calculus, both as compositions of algebraic datatypes and as continuation-like monads. We give a set-theoretic model of the latter in terms of Barr-dinatural transformations. We also introduce CayMon, a tool that takes a polynomial as an input and generates the corresponding equational theory together with the two implementations of the induced monad in Haskell.

### **1 Introduction**

The relationship between universal algebra and monads has been studied at least since Linton [13] and Eilenberg and Moore [4], while the relationship between monads and the general theory of computational effects (exceptions, mutable state, nondeterminism, and such) has been observed by Moggi [14]. By transitivity, one can study computational effects using concepts from universal algebra, which is the main theme of Plotkin and Power's prolific research programme (see [10,20–24] among many others).

The simplest possible case of this approach is to describe an effect via a finitary equational theory: a finite set of operations (of finite arities), together with a finite set of equations. One such example is the theory of monoids:

Operations: γ, ε Equations: γ(x, ε) = x, γ(ε, x) = x, γ(γ(x, y), z) = γ(x, γ(y, z))

The above reads that the signature of the theory consists of two operations: binary γ and nullary ε. The equations state that γ is associative, with ε being its left and right unit.<sup>1</sup> One can also read this theory as a specification of backtracking nondeterminism, in which the order of results matters, where γ is an operation that creates a new computation as a choice between two subcomputations, while ε denotes failure. The connection between the equational theory and the computational effect becomes apparent when we consider the monad of free monoids (that is, the list monad), which is in fact used to form backtracking computations in programming; see, for example, Bird's pearl [1].

This suggests a simple recipe for computational effects: it is enough to come up with an equational theory, and out of the hat comes the induced monad of free algebras that implements the corresponding effect. Such an approach is indeed possible in the category **Set**, where every finitary equational theory admits a free monad, constructed by quotienting terms over the signature by the congruence induced by the equations. However, if we want to implement this monad in a programming language, the situation is not so simple, since in most programming languages (in particular, those without higher inductive types) we cannot generally express this kind of quotients. For instance, to describe a variant of nondeterminism that does not admit duplicate results, we may extend the theory of monoids with an equation stating that γ is idempotent, that is, γ(x, x) = x. But, unlike in the case of general monoids, the monad induced by the theory of idempotent monoids seems to be no longer directly expressible in, say, Haskell. This means that there is no implementation that satisfies all the equations of the theory "on the nose"—one informal argument is that the representations of γ(x, x) and x should be the same whatever the type of x, and this would require a decidable equality test on every type, which is not possible.

Thus, both from the practical viewpoint of programming and as a question on the general nature of equational theories, it makes sense to ask which theories are "simple" enough to induce monads expressible using only the basic type formers, such as (co)products, function spaces, algebraic datatypes, universal and existential quantification. This question seems difficult in general, and to our knowledge there is little work that addresses it. In this paper, we focus on a small piece of this problem: we study a certain subset of such implementable equational theories, and conjure some novel extensions.

The monads that we consider arise from Cayley representations. The overall idea is that if a theory has an expressible, well-behaved (in a sense that we make precise in the paper) Cayley representation, the induced monad also has an expressible implementation. The well-known Cayley theorem for monoids states that every monoid with a carrier X embeds in the monoid of endofunctions X → X. In this paper, we generalise this result: given a polynomial **Set**endofunctor P with natural coefficients, we provide an explicit formula for an equational theory such that its every algebra with a carrier X embeds in a certain algebra with the carrier given by P X → X. Then, we show that the monad of

<sup>1</sup> Although one usually writes γ as an infix operation, we use a "functional" syntax, since, in the following, the arity of corresponding operations may vary.

free algebras of such a theory can be implemented as a continuation-like monad with the endofunctor given at a set A as:

$$\forall X. (A \to PX \to X) \to PX \to X \tag{1}$$

This type is certainly expressible in programming languages based on polymorphic λ-calculi, such as Haskell.

However, before we can give the details of this construction, we need to address some technical issues. It is easy to notice that there may be more than one "Cayley representation" of a given theory: for example, a monoid X embeds not only in <sup>X</sup> <sup>→</sup> <sup>X</sup>, but also in a "smaller" monoid <sup>X</sup> <sup>γ</sup> - X, by which we mean the monoid of functions of the type X → X of the shape a → γ(b, a), where <sup>b</sup> <sup>∈</sup> <sup>X</sup>. The same monoid <sup>X</sup> embeds also in a "bigger" monoid <sup>X</sup><sup>2</sup> <sup>→</sup> <sup>X</sup>, in which we interpret the operations as γ(f,g)=(x, y) → f(g(x, y), y) and ε = (x, y) → x. What makes X → X special is that instantiating (1) with P X = X gives a monad that is *isomorphic* to the list monad (note that in this case, the type (1) is simply the Church representation of lists). At the same time, we cannot use X <sup>γ</sup> - X instead of X → X, since (1) quantifies over sets, and thus there is no natural candidate for γ. Moreover, even though we may use the instantiation P X = X<sup>2</sup>, this choice yields a *different* monad (which we describe in more detail in Sect. 5.4). To sort this out, in Sect. 2, we introduce the notion of *tight Cayley representation*. This notion gives rise to the monad of the following shape, which is a strict generalisation of (1), where R is a **Set**-bifunctor of mixed variance:

$$\forall X. (A \to R(X, X)) \to R(X, X) \tag{2}$$

Formally, all our constructions are set-theoretic—to focus the presentation, the connection with programming languages and type theory is left implicit. Thus, the second issue that we discuss in Sect. 2 is the meaning of the universal quantifier ∀ in (1). It is known [27] that polymorphic functions of this shape enjoy a form of dinaturality proposed by Michael Barr (see Par´e and Rom´an [16]), called by Mulry *strong* dinaturality [15]. We model the universally quantified types above as collections of Barr-dinatural transformations, and prove that if R is a tight representation, the collection (2) is always a set.

In Sect. 4, we give the formula that defines an equational theory given a polynomial functor P. In general, the theories we construct can be multi-sorted, which is useful for avoiding a combinatory explosion of the induced theories, hence a brief discussion of such theories in Sect. 3. We show that P X → X is indeed a tight representation of the generated theory. Then, in Sect. 5, we study a number of examples in order to discover what effects are denoted by the generated theories. It turns out that each theory can be seen as a (rather complex, for nontrivial polynomial functors) composition of backtracking nondeterminism and finite mutable state. Moreover, in Sect. 6, we show that the corresponding monads can be implemented not only as continuation-like monads (1), but also in "direct style", using algebraic datatypes.

Since they are parametrised by a polynomial, both the equational theory and its representation consist of many indexed components, so it is not necessarily trivial to get much intuition simply by looking at the formulas. To facilitate this, we have implemented a tool, called CayMon, that generates the theory from a given polynomial, and produces two implementations in Haskell: as a composition of algebraic datatypes and as a continuation-like ("Cayley") monad (1). The tool can be run in a web browser, and is available at http://pl-uwr.bitbucket. io/caymon.

# **2 Tight Cayley Representations**

In this section, we take a more abstract view on the concept of "Cayley representation". In the literature (for example, [2,5,17,25]), authors usually define Cayley representations of different forms of algebraic structures in terms of embeddings. This means that given an object X, there is a homomorphism σ : X → Y to a different object Y , and moreover σ has a retraction (not necessarily a homomorphism) ρ : Y → X (meaning ρ·σ = id). One important fact, which is usually left implicit, is that the construction of Y from X is in some sense functorial. Since we are interested in coming up with representations for many different equational theories, we first identify sufficient properties of such a representation needed to carry out the construction of the monad (2) sketched in the introduction. In particular, we introduce the notion of *tight Cayley representation*, which characterises the functoriality and naturality conditions for the components of the representation.

As for notation, we use A → B to denote both the type of a morphism in a category, and the set of all functions from A to B (the exponential object in **Set**). Also, for brevity, we write the application of a bifunctor to two arguments, e.g., G(X, Y ), without parentheses, as GXY . We begin with the following definition:

**Definition 1 (see** [16]**).** *Let <sup>C</sup>* , *<sup>D</sup> be categories, and* G, H : *<sup>C</sup>* op <sup>×</sup> *<sup>C</sup>* <sup>→</sup> *<sup>D</sup> be functors. Then, a collection of D-morphisms* θ<sup>X</sup> : GXX → HXX *indexed by C -objects is called a* Barr-dinatural transformation *if it is the case that for all objects* A *in D, objects* X*,* Y *in C , morphisms* f<sup>1</sup> : A → GXX*,* f<sup>2</sup> : A → GY Y *in D, and a morphism* g : X → Y *in C ,*

An important property of Barr-dinaturality is that the component-wise composition gives a well-behaved notion of vertical composition of two such transformations. The connection between Barr-dinatural transformations and Cayley representations is suggested by the fact, shown by Par´e and Rom´an [16], that the collection of such transformations of type H → H for the **Set**-bifunctor H(X, Y ) = X → Y is isomorphic to the set of natural numbers. The latter, equipped with addition and zero (or the former with composition and the identity transformation, respectively), is simply the free monoid with a single generator, that is, an instance of (1) with P X = X and A = 1.

For the remainder of this section, assume that *T* is a category, while F : **Set** → *T* is a functor with a right adjoint U : *T* → **Set**. Intuitively, *T* is a category of algebras of some theory, and U is the forgetful functor. Then, the monad generated by the theory is given by the composition UF. For a function f : A → UX, we write f - = Uf : UFA → UX, where f : F A → X is the contraposition of f via the adjunction (intuitively, the unique homomorphism induced by the freeness of the algebra F A).

**Definition 2.** *A* tight Cayley representation *of T with respect to* F U *consists of the following components:*




Note that the condition (c) states that the objects R are, in a sense, natural. Intuitively, understanding an object RX as an algebra, the condition states that the algebraic structure of RX does not really depend on the set X. The condition (f) may seem rather complicated: the intuition behind the technical formulation is that RXY behaves like a form of a function space (after all, we are interested in abstract *Cayley* representations), and runX,i is an application to an argument specified by i, as in the example below. In such a case, the joint monicity becomes the extensional equality of functions.

*Example 3.* Let us check how Cayley representation for monoids fits the definition above: (a) The bifunctor is RXY = X → Y . (b) The *T* -object for a monoid M is the monoid M → M with γ(f,g) = f ◦ g and ε = id. (c) Given some elements a, b, . . . , c ∈ A, we need to see that g ◦ f1(a) ◦ f1(b) ◦···◦ f1(c) = f2(a)◦f2(b)◦· · ·◦f2(c)◦g. Fortunately, the assumption, which in this case becomes g ◦ f1(a) = f2(a) ◦ g for all a ∈ A, allows us to "commute" g from one side of the chain of function compositions to the other. (d) σM(a) = b → γ(a, b). It is easy to verify that it is a homomorphism. The Barr-dinaturality condition: assuming f(m) = n for some m ∈ M and n ∈ N, and a homomorphism f : M → N, it is the case that, omitting the U functor, RfN(σ<sup>N</sup> (n)) = RfN(σ<sup>N</sup> (f(m))) = b → γ(f(m), f(b)) = b → f(γ(m, b)) = RMf(σM(m)), where the equalities can be explained respectively as: assumption in the definition of Barr-dinaturality, unfolding definitions, homomorphism, unfolding definitions. (e) ρM(f) = f(ε). It is easy to show that it is Barr-dinatural; note that we need to use the fact that *T* -morphisms (that is, monoid homomorphisms) preserve ε. (f) We define I<sup>X</sup> = X, while runX,i(f) = f(i).

The first main result of this paper states that given a tight representation of *T* with respect to F U, the monad given by the composition UF can be alternatively defined using a continuation-like monad constructed with sets of Barr-dinatural transformations:

**Theorem 4.** *For a tight Cayley representation* R *with respect to* F U*, elements of the set* UFA *are in 1-1 correspondence with Barr-dinatural transformations of the type* (A → RXX) → RXX*. In particular, this means that the latter form a set. Moreover, this correspondence gives a monad isomorphism between* UF *and the evident continuation-like structure on* (2)*, given by the unit* (ηA(a))X(f) = f(a) *and the Kleisli extension* (f <sup>∗</sup>(k))X(g) = kX(a → (f(a))X(g))*.*

We denote the set of all Barr-dinatural transformations from the bifunctor (X, Y ) → A → RXY to R as ∀X.(A → RXX) → RXX. This gives us a monad similar in shape to the continuation monad, or, more generally, Kock's codensity monad [12] embodied using the formula for right Kan extensions as ends. One important difference with the codensity monad (except, of course, the fact that we have bifunctors on the right-hand sides of arrows) is that we use Barr-dinatural transformations instead of the usual dinatural transformations [3]. Indeed, if we use ends instead of <sup>∀</sup>, the end <sup>X</sup>(A → RXX) → RXX is given as the collection of all dinatural transformations of the given shape. It is known, however, that even in the simple case when A = 1 and RXY = X → Y , this collection is too big to be a set (see the discussion in [16]), hence such end does not exist.

### **3 Multi-sorted Equational Theories with a Main Sort**

The equational theories that we generate in Sect. 4 are multi-sorted, which is useful for trimming down the combinatorial complexity of the result. This turns out to be, in our view, essential in understanding what computational effects they actually represent. In this section, we give a quick overview of what kind of equational theories we work with, and discuss the construction of their free algebras.

We need to discuss the free algebras here, since we want the freeness to be with respect to a forgetful functor to **Set**, rather than to the usual category of sorted sets; compare [26]. This is because we want the equational theories to generate monads on **Set**, as described in the previous section. In particular, we are interested in theories in which one of the sorts is chosen as the *main* one, and work with the functor that forgets not only the structure, but also the carriers of all the other sorts, only preserving the main one. Luckily, this functor can be factored as a composition of two forgetful functors, each with an obvious left adjoint.

In detail, assume a finite set of sorts <sup>S</sup> <sup>=</sup> {Ω,K1,...,Kd} for some <sup>d</sup> <sup>∈</sup> <sup>N</sup>, where Ω is the main sort. The category of sorted sets is simply the category **Set**|S<sup>|</sup> , where |S| is the discrete category generated by the set S. More explicitly, the objects of **Set**|S<sup>|</sup> are tuples of sets (one for each sort), while morphisms are tuples of functions. Given an S-sorted finitary theory T, we denote the category of its algebras as T-Alg. To see that the forgetful functor from T-Alg to **Set** has a left adjoint, consider the following composition of adjunctions:

$$\text{Set} \underbrace{\zeta \ast\_{(X, \emptyset, \dots, \emptyset)} \ast\_{\text{Set}^{|S|}}}\_{(X, A\_1, \dots, A\_d) \mapsto X} \underbrace{\zeta \ast\_{\text{reset}}}\_{\text{carires}}$$

This means that the free algebra for each sort has the carrier given by the set of terms of the given sort (with variables appearing only at positions intended for the main sort Ω) quotiented by the congruence induced by the equations. This kind of composition of adjunctions is similar to [18], but in this case the compound right adjoints of the theories given in the next section are monadic.

#### **4 Theories from Polynomial Cayley Representations**

In this section, we introduce algebraic theories that are tightly Cayleyrepresented by P X → X for a polynomial functor P. Notation-wise, whenever we write <sup>i</sup> <sup>≤</sup> <sup>k</sup> for a fixed <sup>k</sup> <sup>∈</sup> <sup>N</sup>, we mean that <sup>i</sup> is a natural number in the range 1,...,k, and use [xi]<sup>i</sup>≤<sup>k</sup> to denote a sequence x1,...,xk. The latter notation is used also in arguments of functions and operations, so f([xi]<sup>i</sup>≤<sup>k</sup>) means f(x1,...,xk), while f(x, [yi]<sup>i</sup>≤<sup>k</sup>) means f(x, y1,...,yk). We sometimes use double indexing; for example, by <sup>k</sup> i=1 t*i* <sup>j</sup>=1 Xi,j → Y for some [ti]<sup>i</sup>≤<sup>k</sup>, we mean the type X1,<sup>1</sup> × ··· × X1,t<sup>1</sup> × ··· × Xk,<sup>1</sup> × ··· × Xk,t*<sup>k</sup>* → Y . This is matched by a double-nested notation in arguments, that is, f([[x<sup>j</sup> <sup>i</sup> ]<sup>j</sup>≤t*<sup>i</sup>* ]<sup>i</sup>≤<sup>k</sup>) means f(x<sup>1</sup> 1,...,x<sup>t</sup><sup>1</sup> <sup>1</sup> ,...,x<sup>1</sup> k,...,x<sup>t</sup>*<sup>k</sup>* <sup>k</sup> ). Also, whenever we want to repeat an argument k-times, we write [x]k; for example, f([x]3) means f(x, x, x). Because we use a lot of sub- and superscripts as indices, we do not use the usual notation for exponentiation. This means that x<sup>i</sup> always denotes some x at index i, while a <sup>k</sup>-fold product of some type <sup>X</sup>, ordinarily denoted <sup>X</sup>k, is written as <sup>k</sup> <sup>X</sup>. We use the -- brackets to denote the interpretation of sorts and operations in an algebra (that is, a model of the theory). If the algebra is clear from the context, we skip the brackets in the interpretation of operations.

For the rest of the paper, let <sup>d</sup> <sup>∈</sup> <sup>N</sup> (the number of monomials in the polynomial) and sequences of natural numbers [ci]i≤<sup>d</sup> and [ei]i≤<sup>d</sup> (the coeffcients and exponents respectively) define the following polynomial endofunctor on **Set**: P X =

far numbers  $[c\_i]\_{i \le d}$  and  $[e\_i]\_{i \le d}$  (the coenccens and the following polynomial endofunctor on  $\mathbf{Set}$ :

$$PX = \sum\_{i=1}^{d} c\_i \times \prod^{e\_i} X,\tag{3}$$

where c<sup>i</sup> is an overloaded notation for the set {1,...,ci}. With this data, we define the following equational theory:

**Definition 5.** *Assuming* d*,* [ci]<sup>i</sup>≤<sup>d</sup>*, and* [ei]<sup>i</sup>≤<sup>d</sup> *as above, we define the following equational theory* T*:*

*– Sorts:*

Ω (main sort)

*– Operations:*

$$\begin{aligned} \mathsf{cons} &: \prod\_{i=1}^{d} \prod^{c\_i} K\_i \to \Omega \\ \pi\_i^j &: \Omega \to K\_i, \text{ for } i \le d \text{ and } j \le c\_i \\ \varepsilon\_i^j &: K\_i, \text{ for } i \le d \text{ and } j \le e\_i \\ \gamma\_i^j &: K\_j \times \prod^{e\_j} K\_i \to K\_i, \text{ for } i, j \le d \end{aligned}$$

Ki, for all i ≤ d

*– Equations:*

$$\pi\_i^j(\mathsf{cons}([[x\_i^j]\_{j \le c\_i}]\_{i \le d})) = x\_i^j \tag{\text{beta-}\pi}$$

$$\mathsf{cons}([[\pi\_i^j(x)]\_{j \le c\_i}]\_{i \le d}) = x \tag{eta.-\pi}$$

$$\gamma\_i^j(\varepsilon\_j^k, [x\_t]\_{t \le e\_j}) = x\_k \tag{\text{beta-te}}$$

$$\gamma\_i^i(x, [\varepsilon\_i^j]\_{j \le e\_i}) = x \tag{eta.-\varepsilon}$$

$$\gamma\_i^j(\gamma\_j^k(x, [y\_t]\_{t \le e\_k}), [z\_s]\_{s \le e\_j}) = \gamma\_i^k(x, [\gamma\_i^j(y\_t, [z\_s]\_{s \le e\_j})]\_{t \le e\_k}) \qquad \text{(assoc-}\gamma\text{)}$$

Thus, in the theory T, there is a main sort Ω, which we think of as corresponding to the entire functor, and one sort <sup>K</sup><sup>i</sup> for each "monomial" <sup>e</sup>*<sup>i</sup>* <sup>X</sup>. Then, we can think of Ω as a tuple containing elements of each sort, where each sort K<sup>i</sup> has exactly c<sup>i</sup> occurrences. The fact that Ω is a tuple, which is witnessed by the cons and π operations equipped with the standard equations for tupling and projections, is not too surprising—one should keep in mind that T is a theory represented by the type P X → X, which can be equivalently given as the *product* of function spaces <sup>c</sup><sup>i</sup> <sup>×</sup> e*<sup>i</sup>* <sup>X</sup> <sup>→</sup> <sup>X</sup> for all <sup>i</sup> <sup>≤</sup> <sup>d</sup>.

Each operation γ<sup>j</sup> <sup>i</sup> can be used to compose an element of K<sup>j</sup> and e<sup>j</sup> elements of K<sup>i</sup> to obtain an element of Ki. The ε constants can be seen as selectors: in (beta-ε), ε<sup>k</sup> <sup>j</sup> in the first argument of <sup>γ</sup><sup>j</sup> <sup>i</sup> selects the k-th argument of the sort Ki, while the (eta-ε) equation states that composing a value of K<sup>i</sup> with the successive selectors of K<sup>i</sup> gives back the original value. The equation (assoc-γ) states that the composition of values is associative in an appropriate sense. In Sect. 5, we provide a reading of the theory T as a specification of a computational effect for different choices of d, ci, and ei.

*Remark 6.* If it is the case that e<sup>i</sup> = e<sup>j</sup> for some i, j ≤ d, then the sorts K<sup>i</sup> and K<sup>j</sup> are isomorphic. This means that in every algebra of such a theory, there is an isomorphism of sorts ϕ : -Ki → -K<sup>j</sup> , given by ϕ(x) = γ<sup>i</sup> <sup>j</sup> (x, [ε<sup>k</sup> <sup>j</sup> ]<sup>k</sup>≤e*<sup>i</sup>* ). This suggests an alternative setting, in which instead of having a single <sup>c</sup><sup>i</sup> <sup>×</sup> <sup>e</sup>*<sup>i</sup>* <sup>X</sup> comoponent, we can have <sup>c</sup><sup>i</sup> components of the shape <sup>e</sup>*<sup>i</sup>* <sup>X</sup>. In such a setting, the equational theory T in Definition 5 would be slightly simpler—specifically, there would be no need for double-indexing in the types of cons and π. On the downside, this would obfuscate the connection with computational effects described in Sect. 5 and some conjured extensions in Sect. 7.

The theory T has a tight Cayley representation using functions from P, as detailed in the following theorem. This gives us the second main result of this paper: by Theorem 4, the theory T is the equational theory of the monad (1). The notation in<sup>i</sup> means the i-th inclusion of the coproduct in the functor P.

**Theorem 7.** *The equational theory* T *from Definition 5 is tightly Cayleyrepresented by the following data:*

	- *Carriers of sorts:*

 $emu.$ 
 $\left[\Omega\right] = RXX$ 
 $\left[K\_i\right] = \prod^{e\_i} X \to X$ 

• *Interpretation of operations:*

$$\begin{aligned} \mathsf{[cons]}([[f\_k^j]\_{j \le c\_k}]\_{k \le d})(\mathsf{in}\_i(c\_\*[x\_t]\_{t \le e\_i})) &= f\_i^c([x\_t]\_{t \le e\_i}) \\ \mathsf{[}\pi\_i^j](f)([x\_t]\_{t \le e\_i}) &= f(\mathsf{in}\_i(j, [x\_t]\_{t \le e\_i})) \\ \mathsf{[}\varepsilon\_i^j\|([x\_t]\_{t \le e\_i}) &= x\_j \\ \mathsf{[}\gamma\_i^j\|(f, [g\_k]\_{k \le e\_j})([x\_t]\_{t \le e\_i}) &= f([g\_k([x\_t]\_{t \le e\_i})]\_{k \le e\_j}) \end{aligned}$$

*– The homomorphism* σ<sup>M</sup> *for the main sort and sorts* Ki*:*

$$\begin{aligned} \sigma\_M^{\Omega}(m)(\mathsf{in}\_i(c, [x\_t]\_{t \le e\_i})) &= \mathsf{cons}([[\gamma\_k^i(\pi\_i^c(m), [\pi\_k^j(x\_t)]\_{t \le e\_i})]\_{j \le e\_k}]\_{k \le d})\\ \sigma\_M^i(s)([x\_t]\_{t \le e\_i}) &= \mathsf{cons}([[\gamma\_k^i(s, [\pi\_k^j(x\_t)]\_{t \le e\_i})]\_{j \le e\_k}]\_{k \le d}) \end{aligned}$$

*– The transformation* ρM*:*

$$\begin{split} \rho\_M(f) &= \mathsf{cons}([[\pi\_k^j(f(\mathsf{in}\_k(j, [\mathsf{cons}([w\_r^f]\_{r < k}, [\varepsilon\_k^t]\_{c\_k}, [w\_r^f]\_{k < r \le d})]\_{t \le c\_k}))]\_{j \le c\_k}))\_{k \le d})\\ \text{where } w\_r^f &= [\pi\_r^c(f(\mathsf{in}\_r(c, [\varepsilon\_r^j]\_{j \le c\_r})))]\_{c \le c\_r} \end{split}$$

*– The set of indices* I<sup>X</sup> = P X *and the functions* runX,i(f) = f(i)*.*

In the representing algebra, it is the case that each -Ki represents one monomial, as mentioned in the description of T, while -Ω is the appropriate tuple of representations of monomials, which is encoded as a single function from a coproduct (in our opinion, this encoding turns out to be much more readable on paper), while cons and π are indeed given by tupling and projections. For each i ≤ d, the function ε j <sup>i</sup> simply returns its j-th argument, while γ is interpreted as the usual composition of multi-argument functions.

Homomorphisms between multi-sorted algebras are defined as operationpreserving functions for each sort, so σ is defined for the sort Ω and for each sort Ki. In general, the point of Cayley representations is to encode an element m of an algebra M using its possible behaviours with other elements of the algebra. It is no different here: for each sort K<sup>i</sup> at the c-th occurrence in the tuple, the function σ<sup>Ω</sup> packs (using cons) all possible compositions (by means of γ) of values of K<sup>i</sup> with the "components" of m (extracted using π). The same happens for each s ∈ -Ki in σ<sup>i</sup> <sup>M</sup>(s), but there is no need to unpack s, as it is already a value of a single sort.

The transformation ρ<sup>M</sup> is a bit more complicated. The argument f is, in general, a function from a coproduct to M, but we cannot simply apply f to one value ini(...) for some sort Ki, as we would obviously lose the information about the components in different sorts. This is why we need to apply f to all possible sorts with ε in the right place to ensure that we recover the original value. We extract the information about particular sorts from such values, and combine them using cons. Interestingly, the elements of w<sup>f</sup> <sup>r</sup> could actually be replaced by any expression of the appropriate sort that is preserved by homomorphisms, assuming that f is also preserved. This is needed to ensure that ρ is Barr-dinatural (the fact that f is preserved by homomorphisms is exactly the assumption in the definition of Barr-dinaturality). For example, if e<sup>r</sup> > 0 for some <sup>r</sup> <sup>≤</sup> <sup>d</sup>, one can define <sup>w</sup><sup>f</sup> <sup>r</sup> simply as [ε<sup>j</sup> <sup>r</sup>]<sup>c</sup>*<sup>r</sup>* for some j ≤ er. The complicated expression in the definition of w<sup>f</sup> <sup>r</sup> is a way to produce values also for sorts K<sup>r</sup> with e<sup>r</sup> = 0, which do not have any ε constants.

#### **5 Effects Modeled by Polynomial Representations**

Now we describe what kind of computational effects are captured by the theories introduced in the previous section. It turns out that they all are different compositions of finite mutable state and backtracking nondeterminism. These compositions include the two most basic ones: when the state is *local* for each nondeterministic branch, and when it is *global* to the entire computation.

In the following, if there is only one object of a given kind, we skip the indices. For example, if for some i, it is the case that e<sup>i</sup> = 1, we write ε<sup>i</sup> instead of ε<sup>1</sup> <sup>i</sup> . If d = 1, we skip the subscripts altogether.

#### **5.1 Backtracking Nondeterminism via Monoids**

We recover the original Cayley theorem for monoids instantiating Theorem 7 with P X = X, that is, d = 1 and c<sup>1</sup> = e<sup>1</sup> = 1. In this case, we obtain two sorts, Ω and K, while the equations (beta-π) and (eta-π) instantiate respectively as follows:

$$
\pi(\mathsf{cons}(x)) = x, \quad \mathsf{cons}(\pi(x)) = x
$$

This means that both sorts are isomorphic, so one can think of this theory as being single-sorted. Of course, this is always the case if d = 1 and c<sup>1</sup> = 1. Since e<sup>1</sup> = 1, the operation γ is binary and there is a single ε constant. The equations (beta-ε) and (eta-ε) say, respectively, that ε is the left and right unit of γ, that is:

$$\gamma(\varepsilon, x) = x, \quad \gamma(x, \varepsilon) = x^{\varepsilon}$$

Interestingly, the two unit laws for monoids are symmetrical, but in general the (beta-ε) and (eta-ε) equations are not. One should note that the symmetry is already broken when one implements free monoids (that is, lists) in a programming language: in the usual right-nested implementation, the "beta" rule is part of the definition of the append function, while the "eta" rule is a theorem. The (assoc-γ) equation instantiates as the associativity of γ:

$$\gamma(\gamma(x,y),z) = \gamma(x,\gamma(y,z))$$

#### **5.2 Finite Mutable State**

For <sup>n</sup> <sup>∈</sup> <sup>N</sup>, if we take P X <sup>=</sup> <sup>n</sup>, that is, <sup>d</sup> = 1, <sup>c</sup><sup>1</sup> <sup>=</sup> <sup>n</sup> and <sup>e</sup><sup>1</sup> = 0, we obtain the equational theory of a single mutable cell in which the set of possible states is {1,...,n}. There are two sorts in the theory: Ω and K. The sort K does not have any interesting structure on its own, as there are no constants ε, and the equation (eta-ε) instantiates to

$$
\gamma(x) = x,
$$

which means that γ is necessarily an identity. The fact that this theory is indeed the theory of state becomes apparent when we identify Ω as a sort of computations that require some initial state to proceed, and K as computations that produce a final state. Then, the operations <sup>π</sup><sup>j</sup> : <sup>Ω</sup> <sup>→</sup> <sup>K</sup> (<sup>j</sup> <sup>≤</sup> <sup>n</sup>) are the "update" operations, where π<sup>j</sup> sets the current state to j, while cons : <sup>n</sup> <sup>K</sup> <sup>→</sup> <sup>Ω</sup> is the "lookup" operation, in which the j-th argument is the computation to be executed if the current state is j. The equations (beta-π), for all j ≤ n, and (eta-π) state respectively:

$$
\pi^j(\mathsf{cons}([x\_i]\_{i \le n})) = x\_j, \quad \mathsf{cons}([\pi^i(x)]\_{i \le n}) = x\_j
$$

These equations embody the natural behaviour rules for this limited form of state. The former reads that setting the current state to j and then proceeding with the computation x<sup>i</sup> if the current state is i is the same thing as simply proceeding with x<sup>j</sup> (note that x<sup>j</sup> is of the sort K, hence it does not use the information that the current state has just been updated to j, so there is no need to keep the π<sup>j</sup> operation on the right-hand side of the equation). The latter states that if the current state is i and we set the current state to i, it is the same thing as not changing the state at all (note that x does not depend on the current state, as it is the same in every argument of cons).

Interestingly, the presentations of equational theories for state in the literature (for example, [7,23]) are all single-sorted. Such a setting can be recovered by defining the following macro-operations on the sort Ω: 

$$\begin{aligned} \mathtt{put}^j &: \Omega \to \Omega \\ \mathtt{put}^j(x) &= \mathtt{cons}([\pi^j(x)]\_n) \end{aligned} \qquad \begin{aligned} \mathtt{get} &: \prod^n \Omega \to \Omega \\ \mathtt{get}([x\_i]\_{i \le n}) &= \mathtt{cons}([\pi^i(x\_i)]\_{i \le n}) \end{aligned}$$

The trick here is that the get operation does not change the state (by setting the new state to the current one), while put does not depend on the current state (by having the same computation in every argument of cons). The usual four equations for the interaction of put and get can be obtained by unfolding the definitions and using the (beta-π) and (eta-π) equations:

$$\begin{aligned} \mathtt{put}^j(\mathtt{put}^k(x)) &= \mathtt{put}^k(x) & \mathtt{put}^j(\mathtt{get}([x\_i]\_{i \le n})) &= \mathtt{put}^j(x\_j) \\ \mathtt{get}([\mathtt{get}([x\_i]\_{i \le n})]\_n) &= \mathtt{get}([x\_i]\_{i \le n}) & \mathtt{get}([\mathtt{put}^i(x\_i)]\_{i \le n}) &= \mathtt{get}([x\_i]\_{i \le n}) \end{aligned}$$

The connection with the implementation of state in programming becomes evident when we take a closer look at the endofunctor of the induced monad from Theorem 4. Consider the following informal calculation:

$$\begin{aligned} &\forall X.(A \to n \to X) \to n \to X \\ &\widetilde{\equiv} \forall X.n \to (A \to n \to X) \to X \\ &\widetilde{\equiv} n \to \forall X.(A \to n \to X) \to X \\ &\widetilde{\equiv} n \to \forall X.(A \times n \to X) \to X \\ &\widetilde{\equiv} n \to A \times n \end{aligned} \tag{\forall \text{ commutes with arrows}}$$

This means that not only do we prove that the equational theory corresponds to the usual state monad, but we can actually *derive* the implementation of state as the endofunctor A → (n → A × n).

#### **5.3 Backtracking with Local State**

We obtain one way to combine nondeterminism with state using the functor P X <sup>=</sup> <sup>n</sup> <sup>×</sup> <sup>X</sup>, for <sup>n</sup> <sup>∈</sup> <sup>N</sup>, that is, <sup>d</sup> = 1, <sup>c</sup><sup>1</sup> <sup>=</sup> <sup>n</sup> and <sup>e</sup><sup>1</sup> = 1. It has two sorts, Ω and K, which play roles similar to those detailed in the previous section. However, this time K additionally has the structure of a monoid. This gives us the theory of backtracking with *local* state, which means that whenever we make a choice using the γ operation, the computations in each argument carry separate, non-interfering states. In particular, in a computation γ(x, y), both subcomputations x and y start with the same state, which is the initial state of the entire computation. This non-interference is guaranteed simply by the system of sorts: the arguments of γ are of the sort K, which means that the stateful computations inside the arguments begin with π, which sets a new state.

We can also obtain a single-sorted theory, similar to the case of the pure state. To the put and get macro-operations, we add choice and failure as follows:

$$\begin{aligned} \mathsf{choose} &: \Omega \times \Omega \to \Omega \\ \mathsf{choose} &(x, y) = \mathsf{cons}([\gamma(\pi^j(x), \pi^j(y))]\_{j \le n}) \end{aligned} \qquad \begin{aligned} \mathsf{fail} &: \Omega \\ \mathsf{fail} &= \mathsf{cons}([\varepsilon]\_n) \end{aligned}$$

Then, the locality of state can be summarised by the following equality, which is easy to show using the (beta-π) and (eta-π) equations:

$$\mathsf{put}^k(\mathsf{choose}(x, y)) = \mathsf{choose}(\mathsf{put}^k(x), \mathsf{put}^k(y))$$

#### **5.4 Backtracking with Global State**

Another way to compose nondeterminism and state is by using *global* state, which is obtained for <sup>n</sup> <sup>∈</sup> <sup>N</sup> and P X <sup>=</sup> <sup>X</sup><sup>n</sup>, that is, <sup>d</sup> = 1, <sup>c</sup><sup>1</sup> = 1, and <sup>e</sup><sup>1</sup> <sup>=</sup> <sup>n</sup>. As in the case of pure backtracking nondeterminism, it means that the sorts Ω and K are isomorphic. The intuitive understanding of the expression γ(x, [yi]<sup>i</sup>≤<sup>n</sup>) is: first perform the computation x, and then the computation yi, where i is the final state of the computation x. The operation ε<sup>j</sup> is: fail, but set the current state to j. In this case, the equations (beta-ε) instantiate to the following for all j ≤ n:

$$\gamma(\varepsilon^j, [y\_i]\_{i \le n}) = y\_j$$

It states that if the first computation fails but sets the state to j, the next step is to try the computation y<sup>j</sup> . Note that there is no other way to give a new state than via failure, but this can be circumvented using γ(x, [ε<sup>k</sup>]n) to set the state to k after performing x. The (eta-ε) instantiates to:

$$\gamma(x, [\varepsilon^j]\_{j \le n}) = x$$

This reads that if we execute x and then set the current state to the resulting state of x, it is the same as just executing x.

#### **6 Direct-Style Implementation**

Free algebras of the theory T from Definition 5 can also be presented as terms of a certain shape. They are best described as terms built using the operations from T that are well-typed according to the following typing rules, where the types are called Ω, Ki, and P<sup>i</sup> for i ≤ d. The type of the entire term is Ω, and Var(x) means that x is a variable.

$$\begin{array}{ccc} \frac{[[t\_i^j:K\_i]\_{\substack{j\le c\_i\end{\*}}]\_{i\le d}}}{\mathsf{cons}([[t\_i^j]\_{j\le c\_i}]\_{i\le d}):\Omega} & \quad \varepsilon\_i^j:K\_i & \quad \frac{t:P\_j \quad [w\_k:K\_i]\_{k\le e\_j}}{\gamma\_i^j(t,[w\_k]\_{k\le e\_j}):K\_i} & \quad \frac{\mathrm{VAR}(x)}{\pi\_i^j(x):P\_i} \end{array}$$

Note that even though variables appear as arguments to the operations π, they are not of the type Ω. This means that the entire term cannot be a variable, as it is always constructed with cons as the outermost operation. Each argument of cons is a term of the type K<sup>i</sup> for an appropriate i, which is built out of the operations ε and γ. Note that the first argument of γ is always a variable wrapped in π, while all the other arguments are again terms of the type Ki. Overall, such terms can be captured as the following endofunctors on **Set**, where W<sup>i</sup> represents terms of the type Ki, while W<sup>Ω</sup> represents terms of the type Ω. By μY.GY we mean the carrier of the initial algebra of an endofunctor G. <sup>X</sup> <sup>=</sup> μY.e<sup>i</sup> <sup>+</sup> <sup>d</sup> <sup>c</sup>*<sup>i</sup>* <sup>X</sup>) <sup>×</sup> <sup>e</sup>*<sup>j</sup>* <sup>Y</sup> 

the initial algebra of an endofunctor  $G$ .

 $W^i X = \mu Y. e\_i + \sum\_{j=1}^d \left(\sum^{c\_i} X\right) \times \prod^{e\_j} Y$ 
 $W^\Omega X = \prod\_{i=1}^d \prod^{c\_i} W^i X$ 

Clearly, e<sup>i</sup> in the definition of W<sup>i</sup> represents the ε<sup>i</sup> constants, while the second component of the coproduct is a choice between the γ<sup>i</sup> operations with appropriate arguments.

It is the case that every term of the sort Ω can be normalised to a term of the type Ω by a term-rewriting system obtained by orienting the "beta" and "assoc" equations left to right, and eta-expanding variables at the top-level:

$$\begin{aligned} &\pi\_i^j \left( \mathsf{cons} \left( [[x\_i^j]\_{j \le e\_i}]\_{i \le d} \right) \right) \leadsto x\_i^j\\ &\gamma\_i^j \left( \varepsilon\_j^k, [x\_t]\_{t \le e\_j} \right) \leadsto x\_k\\ &\gamma\_i^j \left( \gamma\_j^k \left( x, [y\_t]\_{t \le e\_k} \right), [z\_s]\_{s \le e\_j} \right) \leadsto \gamma\_i^k \left( x, [\gamma\_i^j \left( y\_t, [z\_s]\_{s \le e\_j} \right)]\_{t \le e\_k} \right) \end{aligned}$$

$$x \leadsto \mathsf{cons} ([[\gamma\_i^j (\pi\_i^j (x), [\varepsilon\_i^k]\_{k \le e\_i})]\_{j \le e\_i}]\_{i \le d})$$

This term rewriting system gives rise to a natural implementation of the monadic structure, where the "beta" and "assoc" rules normalise the two-level term structure, thus implementing the monadic multiplication, while the eta-expansion rule implements the monadic unit.

### **7 Discussion**

The idea for employing Cayley representations to explore implementations of monads induced by equational theories is inspired by Hinze [8], who suggested a connection between codensity monads, Church representation of lists, and the Cayley theorem for monoids. We note that Hinze's discussion is informal, but he suggests using ends, which, as we discuss in Sect. 2, is not sound.

Most of related work follows one of two main paths: it either concentrates on algebraic explanation of monads already used in programming and semantics (for example, [11,19,23]), or on the general connection between different kinds of algebraic theories and computational effects, but without much interest in whether it leads to structures implementable in a programming language. Some exceptions are the construction of the sum of a theory and a free theory [9] or the sum of ideal monads [6]. What we propose in Sect. 4 is a form of a "functional combinatorics": given a type, what kind of algebra describes the possible values?

As our approach veers off the main paths of the recent work on effects, there are many possible directions of future work. One interesting direction would be to generalise **Set**, the base category used throughout this paper, to more abstract categories. After all, we want to talk about structures definable only in terms of (co)products, exponentials, and quantifiers—which are all constructions whose universal properties are singled out and explored using (co)cartesian (or even monoidal) closed categories. However, the current development relies heavily on the particular properties of **Set**, such as extensional equality of functions, which appears in disguise in the condition (f) in Definition 2.

One can also try to extend the type used as a Cayley representation. For example, we could consider the polynomial P in (3) to range over the space of all sets, that is, allow the coefficients c<sup>i</sup> to vary over sets rather than natural numbers. In the Cayley representation, it would be enough to consider functions from c<sup>i</sup> in place of ci-fold products. We would immediately gain expressiveness, as the obtained state monad would no longer need to be defined only for a finite set of possible states. On the flip side, this would make the resulting theory infinitary – which, of course, is not uncommon in the field of algebraic treatment of computational effects. However, we decide to stick to the simplest possible setting in this paper, which greatly simplifies the presentation, but still gives us some novel observations, like the fact that the theory of finite state is simply the theory of 2-sorted tuples in Sect. 5.2, or the novel theory of backtracking nondeterminism with global state in Sect. 5.4. Other future extensions that we believe are worth exploring include iterating the construction to obtain a from of a distributive tensor (compare Rivas *et al*.'s [25] "double" representation of near-semirings) or quantifying over more variables, leading to less interaction between sorts.

**Acknowledgements.** We thank the reviewers for their insightful comments and suggestions.

Maciej Pir´og was supported by the National Science Centre, Poland under POLONEZ 3 grant "Algebraic Effects and Continuations" no. 2016/23/P/ST6/02217. This project has received funding from the European Union's Horizon 2020 research and innovation

programme under the Marie Sklodowska-Curie grant agreement No 665778.

Piotr Polesiuk was supported by the National Science Centre,

Poland, under grant no. 2014/15/B/ST6/00619.

Filip Sieczkowski was supported by the National Science Centre, Poland, under grant no. 2016/23/D/ST6/01387.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Dialectica-Like Interpretation of a Linear MSO on Infinite Words**

Pierre Pradic1,2 and Colin Riba1(B)

<sup>1</sup> ENS de Lyon, Universit´e de Lyon, LIP, UMR 5668 CNRS ENS Lyon UCBL Inria, Lyon, France colin.riba@ens-lyon.fr <sup>2</sup> Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland

**Abstract.** We devise a variant of Dialectica interpretation of intuitionistic linear logic for LMSO, a linear logic-based version MSO over infinite words. LMSO was known to be correct and complete w.r.t. Church's synthesis, thanks to an automata-based realizability model. Invoking B¨uchi-Landweber Theorem and building on a complete axiomatization of MSO on infinite words, our interpretation provides us with a syntactic approach, without any further construction of automata on infinite words. Via Dialectica, as linear negation directly corresponds to switching players in games, we furthermore obtain a complete logic: either a closed formula or its linear negation is provable. This completely axiomatizes the theory of the realizability model of LMSO. Besides, this shows that in principle, one can solve Church's synthesis for a given ∀∃-formula by only looking for proofs of either that formula or its linear negation.

**Keywords:** Linear logic · Dialectica interpretation · MSO on Infinite Words

#### **1 Introduction**

Monadic Second-Order Logic (MSO) over ω-words is a simple yet expressive language for reasoning on non-terminating systems which subsumes non-trivial logics used in verification such as LTL (see e.g. [2,30]). MSO on ω-words is decidable by B¨uchi's Theorem [6] (see e.g. [24,29]), and can be completely axiomatized as a subsystem of second-order Peano's arithmetic [28]. While MSO admits an effective translation to finite-state (B¨uchi) automata, it is a non-constructive logic, in the sense that it has true (*i.e.*provable) ∀∃-statements which can be witnessed by no continuous stream function.

On the other hand, Church's synthesis [8] can be seen as a decision problem for a strong form of constructivity in MSO. More precisely (see e.g. [12,32]),

This work was partially supported by the ANR-14-CE25-0007 - RAPIDO and Polish National Science Centre grant no. 2014/13/B/ST6/03595.

c The Author(s) 2019

M. Boja´nczyk and A. Simpson (Eds.): FOSSACS 2019, LNCS 11425, pp. 470–487, 2019. https://doi.org/10.1007/978-3-030-17127-8\_27

Church's synthesis takes as input a ∀∃-formula of MSO and asks whether it can be realized by a finite-state causal stream transducer. Church's synthesis is known to be decidable since B¨uchi-Landweber Theorem [7], which gives an effective solution to ω-regular games on finite graphs generated by ∀∃-formulae. In traditional (theoretical) solutions to Church's synthesis, the game graphs are induced from deterministic (say parity) automata obtained by McNaughton's Theorem [19]. Despite its long history, Church's synthesis has not yet been amenable to tractable solutions for the full language of MSO (see e.g. [12]).

In recent works [25,26], the authors suggested a Curry-Howard approach to Church's synthesis based on intuitionistic and linear variants of MSO. In particular, [26] proposed a system LMSO based on (intuitionistic) linear logic [13], in which via a translation (−)<sup>L</sup> : MSO <sup>→</sup> LMSO, the provable ∀∃(−)<sup>L</sup>-statements exactly correspond to the realizable instances of Church's synthesis. Realizer extraction for LMSO is done via an external realizability model based on alternating automata, which amounts to see every formula ϕ(a) as a formula of the form (∃u)(∀x)ϕD(u, x, a), where ϕ<sup>D</sup> represents a deterministic automaton.

In this paper, we use a variant of G¨odel's "Dialectica" functional interpretation as a syntactic formulation of the automata-based realizability model of [26]. Dialectica associates to <sup>ϕ</sup>(a) a formula <sup>ϕ</sup><sup>D</sup>(a) of the form (∃u)(∀x)ϕD(u, x, a). In usual versions formulated in higher-types arithmetic (see e.g. [1,16]), the formula ϕ<sup>D</sup> is quantifier-free, so that ϕ<sup>D</sup> is a prenex form of ϕ. This prenex form is constructive, and a constructive proof of ϕ can be turned to a proof of ϕ<sup>D</sup> with an explicit witness for ∃u. Even if Dialectica originally interprets intuitionistic arithmetic, it is structurally linear, and linear versions of Dialectica were formulated at the very beginning of linear logic [21–23] (see also [14,27]).

We show that the automata-based realizability model of [26] can be obtained by a suitable modification of the usual linear Dialectica interpretation, in which the formula ϕ<sup>D</sup> essentially represents a deterministic automaton on ω-words and is in general not quantifier-free, and whose realizers are exactly the finitestate accepting strategies in the model of [26]. In addition to provide a syntactic extraction procedure with internalized and automata-free correctness proof, this reformulation has a striking consequence, namely that there exists an extension LMSO(C) of LMSO which is complete in the sense that for each closed formula ϕ, it either proves ϕ or its linear negation ϕ - ⊥. Since LMSO(C) has realizers for all provable ∀∃(−)<sup>L</sup>-statements, its completeness contrasts with the classical setting, in which due to provable non-constructive statements, one can not decide Church's synthesis by only looking for proofs of ∀∃-statements or their negations. Besides, LMSO(C) has a linear choice axiom which is realizable in the sense of both (−)<sup>D</sup> and [26], but whose naive MSO counterpart is false.

The paper is organized as follows. We present our basic setting in Sect. 2, with a particular emphasis on particularities of (finite-state) causal functions to model strategies and realizers. Our variant of Dialectica and the corresponding linear system are discussed in Sect. 3, while Sect. 4 defines the systems LMSO and LMSO(C) and shows the completeness of LMSO(C).

#### **2 Preliminaries**

Alphabets (denoted Σ, Γ, etc) are finite non-empty sets of the form **2**<sup>p</sup> for some <sup>p</sup> <sup>∈</sup> <sup>N</sup>. We let **<sup>1</sup>** := **<sup>2</sup>**<sup>0</sup>. Note that alphabets are closed under Cartesian products and set-theoretic function spaces. It follows that taking o := **2**, we have an alphabet τ for each simple type τ ∈ ST, where

$$\{\sigma,\tau\in\text{ST}\quad\exists\dots=\quad\mathbf{1}\quad|\quad o\quad|\quad\sigma\times\tau\quad|\quad\sigma\to\tau\rangle$$

We often write (<sup>τ</sup> )<sup>σ</sup> for the type <sup>σ</sup> <sup>→</sup> <sup>τ</sup> . Given an <sup>ω</sup>-word (or stream) <sup>B</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> and <sup>n</sup> <sup>∈</sup> <sup>N</sup>, we write <sup>B</sup><sup>n</sup> for the finite word <sup>B</sup>(0). ··· .B(<sup>n</sup> <sup>−</sup> 1) <sup>∈</sup> <sup>Σ</sup>∗.

**Church's Synthesis and Causal Functions.** Church's synthesis consists in the automatic extraction of stream functions from input-output specifications (see e.g. [12,31]). These specifications are in general asked to be ω-regular, or equivalently definable in MSO over ω-words. In practice, proper subsets of MSO (and even of LTL) are assumed (see e.g. [5,11,12]). As an example, the relation

$$(\exists^{\infty}k)B(k) \quad \Rightarrow \ (\exists^{\infty}k)C(k) \qquad \text{resp.} \qquad (\forall^{\infty}k)B(k) \quad \Rightarrow \ (\exists^{\infty}k)C(k) \quad \text{(1)}$$

with input <sup>B</sup> <sup>∈</sup> **<sup>2</sup>**<sup>ω</sup> and output <sup>C</sup> <sup>∈</sup> **<sup>2</sup>**<sup>ω</sup> specifies functions <sup>F</sup> : **<sup>2</sup>**<sup>ω</sup> <sup>→</sup> **<sup>2</sup>**<sup>ω</sup> such that <sup>F</sup>(B) <sup>∈</sup> **<sup>2</sup>**<sup>ω</sup> P(N) is infinite whenever <sup>B</sup> <sup>∈</sup> **<sup>2</sup>**<sup>ω</sup> P(N) is infinite (resp. the complement of B is finite). One may also additionally require to respect the transitions of some automaton. For instance, following [31], in addition to either case of (1) one can ask C ⊆ B and C not to contain two consecutive positions:

$$(\forall n)(C(n) \quad \Rightarrow \quad B(n)) \qquad \text{and} \qquad (\forall n)(C(n) \quad \Rightarrow \quad \neg C(n+1))\tag{2}$$

In any case, the realizers must be (finite-state) causal functions. A stream function <sup>F</sup> : <sup>Σ</sup><sup>ω</sup> <sup>→</sup> <sup>Γ</sup> <sup>ω</sup> is causal (notation <sup>F</sup> : <sup>Σ</sup> <sup>→</sup><sup>S</sup> <sup>Γ</sup>) if it can produce a prefix of length n of its output from a prefix of length n of its input. Hence F is causal if it is induced by a map <sup>f</sup> : <sup>Σ</sup><sup>+</sup> <sup>→</sup> <sup>Γ</sup> as follows:

<sup>F</sup>(B)(n) = <sup>f</sup>(B(0) · ... · <sup>B</sup>(n)) (for all <sup>B</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> and all <sup>n</sup> <sup>∈</sup> <sup>N</sup>)

The finite-state (f.s.) causal functions are those induced by Mealy machines. A Mealy machine M : Σ → Γ is a DFA over input alphabet Σ equipped with an output function λ : Q<sup>M</sup> × Σ → Γ (where Q<sup>M</sup> is the state set of M). Writing ∂<sup>∗</sup> : Σ<sup>∗</sup> → Q<sup>M</sup> for the iteration of the transition function ∂ of M from its initial state, <sup>M</sup> induces a causal function via (a.<sup>a</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup>) → (λ(∂∗(a), a) ∈ Γ).

Causal and f.s. causal functions form categories with finite products. Let S be the category whose objects are alphabets and whose maps from Σ to Γ are causal functions <sup>F</sup> : <sup>Σ</sup><sup>ω</sup> <sup>→</sup> <sup>Γ</sup> <sup>ω</sup>. Let <sup>M</sup> be the wide subcategory of <sup>S</sup> whose maps are finite-state causal functions.<sup>1</sup>

<sup>1</sup> A subcategory D of C is *wide* if D has the same objects as C.

**Fig. 1.** A Mealy machine (left) and an equivalent eager (Moore) machine (right).


**Proposition 1.** *The Cartesian product of* <sup>Σ</sup>1,...,Σ<sup>n</sup> *(for* <sup>n</sup> <sup>≥</sup> <sup>0</sup>*) in* <sup>S</sup>, <sup>M</sup> *is given by the product of sets* Σ<sup>1</sup> ×···× Σ<sup>n</sup> *(so that* **1** *is terminal).*

**The Logic MSO(M).** Our specification language MSO(**M**) is an extension of MSO on ω-words with one function symbol for each f.s. causal function. More precisely, MSO(**M**) is a many-sorted first-order logic, with one sort for each simple type τ ∈ ST, and with one function symbol of arity (σ1,...,σn; τ ) for each map σ1×···×σn →<sup>M</sup> τ . A term t of sort τ (notation t<sup>τ</sup> ) with free variables among x<sup>σ</sup><sup>1</sup> <sup>1</sup> ,...,x<sup>σ</sup>*<sup>n</sup>* <sup>n</sup> (we say that t is of arity (σ1,...,σn; τ )) thus induces a map t : σ1 ×···× σn →<sup>M</sup> τ . Given a valuation x<sup>i</sup> → B<sup>i</sup> ∈ <sup>σ</sup>i<sup>ω</sup> <sup>S</sup>[**1**, σi] for i ∈ {1,...,n}, we then obtain an ω-word

$$\begin{aligned} \mathsf{[t]} \circ \langle B\_1, \dots, B\_n \rangle & \in & \mathbb{S}[\mathbf{1}, \lbrack \tau \rVert] \quad \simeq & \lbrack \tau \rVert^\omega \end{aligned} $$

MSO(**M**) extends MSO with <sup>∃</sup>x<sup>τ</sup> and <sup>∀</sup>x<sup>τ</sup> ranging over <sup>S</sup>[**1**, τ ] τ <sup>ω</sup> and with sorted equalities <sup>t</sup><sup>τ</sup> . = u<sup>τ</sup> interpreted as equality over S[**1**, τ ] τ <sup>ω</sup>. Write |= ϕ when ϕ holds in this model, called the *standard* model. The full definition of MSO(**M**) is deferred to Sect. 4.1.

An instance of Church's synthesis problem is given by a closed formula (∀x<sup>σ</sup>)(∃u<sup>τ</sup> )ϕ(u, x). A positive solution (or realizer) of this instance is a term <sup>t</sup>(x) of arity (σ; <sup>τ</sup> ) such that (∀x<sup>σ</sup>)ϕ(t(x), x) holds.

Proposition 1 implies that MSO(**M**) proves the following equations:

<sup>π</sup>i(<sup>t</sup>1,..., <sup>t</sup><sup>n</sup>) . <sup>=</sup><sup>σ</sup>*<sup>i</sup>* <sup>t</sup><sup>i</sup> and <sup>t</sup> . =<sup>σ</sup>1×···×σ*<sup>n</sup>* π1(t),...,πn(t) (3)

Hence each formula ϕ(a<sup>σ</sup><sup>1</sup> <sup>1</sup> ,...,a<sup>σ</sup>*<sup>n</sup>* <sup>n</sup> ) can be seen as a formula ϕ(a<sup>σ</sup>1×···×σ*<sup>n</sup>* ).

**Eager Functions.** A causal function Σ →<sup>S</sup> Γ is eager if it can produce a prefix of length n+ 1 of its output from a prefix of length n of its input. More precisely, an eager F : Σ →<sup>S</sup> Γ is induced by a map f : Σ<sup>∗</sup> → Γ as

$$F(B)(n) \quad = \quad f(B(0)\cdot \ldots \cdot B(n-1)) \quad \quad \text{ (for all } B \in \Sigma^{\omega} \text{ and all } n \in \mathbb{N})$$

Finite-state eager functions are those induced by eager (Moore) machines (see also [11]). An eager machine E : Σ → Γ is a Mealy machine Σ → Γ whose output function λ : Q<sup>E</sup> → Γ is does not depend on the current input letter. An eager E : Σ → Γ induces an eager function via the map (a ∈ Σ∗) → (λ<sup>E</sup> (∂<sup>∗</sup> <sup>E</sup> (a)) <sup>∈</sup> <sup>Γ</sup>).

We write F : Σ →<sup>E</sup> Γ when F : Σ →<sup>S</sup> Γ is eager and F : Σ →EM Γ when F is f.s. eager. All functions F : Σ →<sup>M</sup> **1**, and more generally, constants functions F : Σ →<sup>S</sup> Γ are eager. Note also that if F : Σ →<sup>S</sup> Γ is eager, then F : Σ →EM Γ. On the other hand, if F : Σ →EM Γ is induced by an eager machine E then F is finite-state causal as being induced by the Mealy machine with same states and transitions as E, but with output function (q, a) → λ<sup>E</sup> (q).

Eager functions do not form a category since the identity of S is not eager. On the other hand, eager functions are closed under composition with causal functions.

# **Proposition 2.** *If* F *is eager and* G, H *are causal then* H ◦ F ◦ G *is eager.*

Isolating eager functions allows a proper treatment of strategies in games and realizers w.r.t. the Dialectica interpretation. Since <sup>Σ</sup><sup>+</sup> <sup>→</sup> <sup>Γ</sup> <sup>Σ</sup><sup>∗</sup> <sup>→</sup> <sup>Γ</sup> <sup>Σ</sup>, maps <sup>Σ</sup> <sup>→</sup><sup>E</sup> <sup>Γ</sup> <sup>Σ</sup> are in bijection with maps <sup>Σ</sup> <sup>→</sup><sup>S</sup> <sup>Γ</sup>. This easily extends to machines. Given a Mealy machine <sup>M</sup> : <sup>Σ</sup> <sup>→</sup> <sup>Γ</sup>, let *<sup>Λ</sup>*(M) : <sup>Σ</sup> <sup>→</sup> <sup>Γ</sup> <sup>Σ</sup> be the eager machine defined as M but with output map taking q ∈ Q<sup>M</sup> to (a <sup>→</sup> <sup>λ</sup>M(q, <sup>a</sup>)) <sup>∈</sup> <sup>Γ</sup> <sup>Σ</sup>.

*Example 2.* Recall the Mealy machine M : **2** → **2** of Ex. 1.(c). Then *Λ*(M) : **<sup>2</sup>** <sup>→</sup> **<sup>2</sup><sup>2</sup>** is the eager machine displayed in Fig. <sup>1</sup> (right, where the output is indicated within states).

Eager f.s. functions will often be used with the following notations. First, let @ be the pointwise lift to <sup>M</sup> of the usual application function <sup>Γ</sup> <sup>Σ</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>Γ</sup>. We often write (F)G for @(F, G). Consider a Mealy machine M : Σ → Γ and the induced eager machine *<sup>Λ</sup>*(M) : <sup>Σ</sup> <sup>→</sup> <sup>Γ</sup> <sup>Σ</sup>. We have

$$F\_{\mathcal{M}}(B) \quad = \quad \text{@}(F\_{\mathcal{A}(\mathcal{M})}(B), B) \quad \quad \quad \quad \text{(for all } B \in \Sigma^{\omega})$$

Given <sup>F</sup> : <sup>Γ</sup> <sup>→</sup><sup>E</sup> <sup>Σ</sup><sup>Γ</sup> , we write **<sup>e</sup>**(F) for the causal @(F(−), <sup>−</sup>) : <sup>Γ</sup> <sup>→</sup><sup>S</sup> <sup>Σ</sup>. Given <sup>F</sup> : <sup>Γ</sup> <sup>→</sup><sup>S</sup> <sup>Σ</sup>, we write *<sup>Λ</sup>*(F) for the eager <sup>Γ</sup> <sup>→</sup><sup>E</sup> <sup>Σ</sup><sup>Γ</sup> such that <sup>F</sup> <sup>=</sup> **<sup>e</sup>**(*Λ*(F)). We extend these notations to terms.

Eager functions admit fixpoints similar to those of contractive maps in the topos of tree (see e.g. [4, Thm. 2.4]).

**Proposition 3.** *For each* <sup>F</sup> : <sup>Σ</sup> <sup>×</sup> <sup>Γ</sup> <sup>→</sup><sup>E</sup> <sup>Σ</sup><sup>Γ</sup> *there is a* fix(F) : <sup>Γ</sup> <sup>→</sup><sup>E</sup> <sup>Σ</sup><sup>Γ</sup> *s.t.*

$$\text{fix}(F)(C) \quad = \quad F\{\mathbf{e}(\text{fix}(F))(C), C\} \quad \quad \quad \text{(for all } C \in I^{\omega}\text{)}$$

*If* <sup>F</sup> *is induced by the eager machine* <sup>E</sup> : <sup>Σ</sup> <sup>×</sup><sup>Γ</sup> <sup>→</sup> <sup>Σ</sup><sup>Γ</sup> *, then* fix(F) *is induced by the eager* <sup>H</sup> : <sup>Γ</sup> <sup>→</sup> <sup>Σ</sup><sup>Γ</sup> *defined as* <sup>E</sup> *but with* <sup>∂</sup><sup>H</sup> : (q, <sup>b</sup>) → ∂<sup>E</sup> - q, ((λ<sup>E</sup> (q))b, b) *.*

**Games.** Traditional solutions to Church's synthesis turn specifications to infinite two-player games with ω-regular winning conditions. Consider an MSO(**M**) formula ϕ(u<sup>τ</sup> , xσ) with no free variable other than u, x. We see this formula as defining a two-player infinite game <sup>G</sup>(ϕ)(u<sup>τ</sup> , xσ) between the *Proponent* <sup>P</sup> (∃lo¨ıse), playing moves in τ and the *Opponent* O (∀b´elard), playing moves in σ. The Proponent begins, and then the two players alternate, producing an infinite play of the form

$$\chi \quad := \quad \mathfrak{u}\_0 \mathfrak{x}\_0 \cdots \mathfrak{u}\_n \mathfrak{x}\_n \cdots \quad \simeq \quad ((\mathfrak{u}\_k)\_k, (\mathfrak{x}\_k)\_k) \in [\![\tau] \!]^\omega \times [\![\sigma] \!]^\omega$$

The play χ is winning for P if ϕ((uk)k,(x)k) holds. Otherwise χ is winning for O. Strategies for P resp. O in this game are functions

$$\left[\left[\sigma\right]\right]^{\*} \longrightarrow \left[\left[\tau\right]\right] \qquad \text{resp.} \qquad \left[\left[\tau\right]\right]^{+} \longrightarrow \left[\left[\sigma\right]\right] \simeq \left[\left[\tau\right]\right]^{\*} \longrightarrow \left[\left[\sigma\right]\right]^{\left[\tau\right]}$$

Hence finite-state strategies are represented by f.s. eager functions. In particular, a realizer of (∀x<sup>σ</sup>)(∃u<sup>τ</sup> )ϕ(u, x) in the sense of Church is a f.s. <sup>P</sup>-strategy in

$$\mathcal{G}(\varphi((u)x,x)) (u^{(\tau)\sigma}, x^{\sigma})$$

Most approaches to Church's synthesis reduce to B¨uchi-Landweber Theorem [7], stating that games with ω-regular winning conditions are effectively determined, and that the winner always has a finite-state winning strategy. We will use B¨uchi-Landweber Theorem in following form. Note that an O-strategy in the game <sup>G</sup>(ϕ)(u<sup>τ</sup> , x<sup>σ</sup>) is a <sup>P</sup>-strategy in the game <sup>G</sup> - ¬ϕ(u,(x)u) x(σ)<sup>τ</sup> , u<sup>τ</sup> .

**Theorem 1 (**[7]**).** *Let* ϕ(u<sup>τ</sup> , x<sup>σ</sup>) *be an* MSO(**M**)*-formula with only* u, x *free. Then either there is an eager term* u(x) *of arity* (σ; τ ) *such that* |= (∀x)ϕ(u(x), x) *or there is an eager term* x(u) *of arity* (τ ; (σ)τ ) *such that* |= (∀u)¬ϕ(u, **e**(x)(u))*. It is decidable which case holds and the terms are computable from* ϕ*.*

**Curry-Howard Approaches.** Following the complete axiomatization of MSO on ω-words of [28] (see also [26]), one can axiomatize MSO(**M**) with a deduction system based on arithmetic (see Sect. 4.1). Consider an instance of Church's synthesis (∀x<sup>σ</sup>)(∃u<sup>τ</sup> )ϕ(u, x). Then we get from Theorem <sup>1</sup> the alternative

$$\vdash\_{\mathsf{MSO}(\mathsf{M})} (\forall x) \varphi \big( \mathsf{e}(\mathsf{u})(x), x \big) \quad \text{or} \quad \vdash\_{\mathsf{MSO}(\mathsf{M})} (\forall u) \neg \varphi \big( (u)(\mathsf{x}(u)), \mathsf{x}(u) \big) \tag{4}$$

for an eager term u(x) or a causal term x(u). By enumerating proofs and machines, one thus gets a (naive) syntactic algorithm for Church's synthesis. But it seems however unlikely to obtain a complete classical system in which the provable ∀∃-statements do correspond to the realizable instances of Church's synthesis, because MSO(**M**) has true but unrealizable ∀∃-statements. Besides, note that

$$\begin{array}{c} (\forall x^{\sigma}) \varphi(\mathbf{e}(\mathbf{u})(x),x) \quad \vdash\_{\mathsf{MSO}(\mathbf{M})} (\forall x^{\sigma}) (\exists u^{\tau}) \varphi(u,x) \\ (\forall u^{(\tau)\sigma}) \neg \varphi\big((u)(\mathbf{x}(u)),\mathbf{x}(u)\big) \quad \vdash\_{\mathsf{MSO}(\mathbf{M})} (\forall u^{(\tau)\sigma}) (\exists x^{\sigma}) \neg \varphi\big((u)x,x\big) \\ \neg(\forall x^{\sigma}) (\exists u^{\tau}) \varphi(u,x) \quad \vdash\_{\mathsf{MSO}(\mathbf{M})} (\forall u^{(\tau)\sigma}) (\exists x^{\sigma}) \neg \varphi\big((u)x,x\big) \end{array}$$

while it is possible both for realizable and unrealizable instances to have

$$\vdash\_{\mathsf{MSO}(\mathsf{M})} \quad (\forall x^{\sigma}) (\exists u^{\tau}) \varphi(u, x) \quad \land \quad (\forall u^{(\tau)\sigma}) (\exists x^{\sigma}) \neg \varphi\left((u)x, x) \tag{5}$$

In previous works [25,26], the authors devised intuitionistic and linear variants of MSO on ω-words in which, thanks to automata-based polarity systems, proofs of suitably polarized existential statements correspond exactly to realizers for Church's synthesis. In particular, [26] proposed a system LMSO based on (intuitionistic) linear logic [13], such that via a translation (−)<sup>L</sup> : MSO <sup>→</sup> LMSO, provable ∀∃(−)<sup>L</sup>-statements exactly correspond to realizable instances of Church's synthesis, while (4) exactly corresponds to alternatives of the form

$$\vdash\_{\mathsf{LMSO}} (\forall x^{\sigma})(\exists u^{\tau}) \left[\varphi((u)x,x)\right]^{L} \text{ or } \vdash\_{\mathsf{LMSO}} (\forall u^{(\tau)\sigma})(\exists x^{\sigma}) \left[\neg\varphi((u)x,x)\right]^{L} \tag{6}$$

This paper goes further. We show that the automata-based realizability model of [26] can be obtained in a syntactic way, thanks to a (linear) Dialecticalike interpretation of a variant of LMSO, which turns a formula ϕ to a formula <sup>ϕ</sup><sup>D</sup> of the form (∃u)(∀x)ϕD(u, x), where <sup>ϕ</sup>D(u, x) essentially represents a deterministic automaton. While the correctness of the extraction procedure of [25,26] relied on automata-theoretic techniques, we show here that it can be performed syntactically. Second, by extending LMSO with realizable axioms, we obtain a system LMSO(C) in which, using an adaptation of the usual *Characterization Theorem* for Dialectica stating that (see e.g. [16]), alternatives of the form (6) imply that for a closed ϕ,

$$\vdash \mathsf{LMSO}(\mathfrak{c}) \; \varphi \qquad \text{or} \qquad \vdash \mathsf{LMSO}(\mathfrak{c}) \; \varphi \multimap \bot$$

where (−) - ⊥ is a *linear* negation. We thus get a complete *linear* system with extraction of suitably polarized ∀∃-statements. Such a system can of course not have a standard semantics, and indeed, LMSO(C) has a functional choice axiom

$$(\forall x^{\sigma})(\exists y^{\tau})\varphi(x,y) \quad \cdots \quad (\exists f^{(\tau)\sigma})(\forall x^{\sigma})\varphi(x,(f)x) \tag{\mathsf{LAC}}$$

which is realizable in the sense of both (−)<sup>D</sup> and [26], but whose translation to MSO(**M**) (which precludes (5)) is false in the standard model.

#### **3 A Monadic Linear Dialectica-Like Interpretation**

G¨odel's "Dialectica" functional interpretation associates to ϕ(a) a formula ϕ<sup>D</sup>(a) of the form (∃u<sup>τ</sup> )(∀x<sup>σ</sup>)ϕD(u, x, a). In usual versions formulated in higher-types arithmetic (see e.g. [1,16]), the formula ϕ<sup>D</sup> is quantifier-free, so that ϕ<sup>D</sup> is a prenex form of ϕ. This prenex form is constructive, and a constructive proof of <sup>ϕ</sup> can be turned to a proof of <sup>ϕ</sup><sup>D</sup> with an explicit (closed) witness for <sup>∃</sup>u. We call such witnesses *realizers* of ϕ. Even if Dialectica originally interprets intuitionistic arithmetic, it is structurally linear: in general, realizers of contraction

$$
\varphi(a) \quad \rangle \quad \longrightarrow \quad \varphi(a) \land \varphi(a),
$$

#### A Dialectica-Like Interpretation of a Linear MSO on Infinite Words 477

ϕ ϕ ϕ γ, ϕ ψ, γ <sup>ψ</sup>- ϕ, ψ ϕ- , ψ- ϕ, ϕ, ψ, ψ ϕ- ϕ, ψ, ϕ, ψ ϕ- ϕ ϕ- , ϕ, ψ, ψ- ϕ ϕ- , ψ, ϕ, ψ- ϕ ψ ϕ, **I** <sup>ψ</sup> **<sup>I</sup>** ϕ, ϕ0, ϕ<sup>1</sup> ϕ- ϕ, ϕ<sup>0</sup> ⊗ ϕ<sup>1</sup> ϕ- ϕ ϕ, ϕ <sup>ψ</sup> ψ, <sup>ψ</sup>- ϕ, ψ ϕ ⊗ ψ, ϕ- , ψ- ϕ, ϕ ψ ϕ ϕ ψ ⊥ ϕ ψ ϕ ⊥, ψ ϕ, ϕ ϕ ψ, ψ <sup>ψ</sup>- ϕ, ψ, ϕ ψ ϕ- , ψ- ϕ ϕ0, ϕ1, ϕ- ϕ ϕ<sup>0</sup> ϕ1, ϕ- ϕ ϕ, ϕ ψ, ψ <sup>ψ</sup>- ϕ, ψ, ϕ ψ ϕ- , ψ- ϕ, ϕ ϕ- ϕ, (∃z<sup>τ</sup> )ϕ ϕ- <sup>ϕ</sup> <sup>ϕ</sup>[t<sup>τ</sup> /x<sup>τ</sup> ], <sup>ϕ</sup>- ϕ (∃x<sup>τ</sup> )ϕ, ϕ- ϕ, ϕ[t<sup>τ</sup> /x<sup>τ</sup> ] <sup>ϕ</sup>- ϕ, (∀x<sup>τ</sup> )ϕ ϕ- ϕ ϕ ϕ (∀z<sup>τ</sup> )ϕ

**Fig. 2.** Deduction for MF (where *<sup>z</sup>*<sup>τ</sup> is fresh).

only exist when the term language can decide ϕD(u, x, a), which is possible in arithmetic but not in all settings. Besides, linear versions of Dialectica were formulated at the very beginning of linear logic [21–23] (see also [14,27]).

In this paper, we use a variant of Dialectica as a syntactic formulation of the automata-based realizability model of [26]. The formula ϕ<sup>D</sup> essentially represents a deterministic automaton on ω-words and is in general not quantifier-free. Moreover, we extract f.s. causal functions, while the category M is not closed. As a result, a realizer of ϕ is an *open* (eager) term u(x) of arity (σ; τ ) satisfying ϕD(u(x), x). While it is possible to exhibit realizers for contraction on closed ϕ thanks to the B¨uchi-Landweber Theorem, this is generally not the case for open ϕ(a). We therefore resort to working in a linear system, in which we obtain witnesses for ∀∃(−)<sup>L</sup>-statements (and thus for realizable instances of Church's synthesis), but not for all ∀∃-statements.

Fix a set of atomic formulae At containing all (t<sup>τ</sup> . = u<sup>τ</sup> ), and a standard interpretation extending Sect. 2 for each α ∈ At.

#### **3.1 The Multiplicative Fragment**

Our linear system is based on *full intuitionistic linear logic* (see [15]). The formulae of the multiplicative fragment MF are given by the grammar:

$$\varphi, \psi \implies \mathbf{I} \mid \perp \mid \alpha \mid \varphi \multimap \psi \mid \varphi \otimes \psi \mid \varphi \boxtimes \psi \mid (\exists x^{\tau}) \varphi \mid (\forall x^{\tau}) \varphi$$

(where α ∈ At). Deduction is given by the rules of Fig. 2 and the axioms

$$\begin{array}{ccc}\hline\hline \texttt{\texttt{\texttt{\texttt{t}^{\tau}}}} & \texttt{\texttt{\texttt{t}^{\tau}}} & \texttt{\texttt{t}^{\tau}} & \texttt{\texttt{\texttt{t}^{\tau}}} & \texttt{\texttt{\texttt{t}^{\tau}}} & \texttt{\texttt{\texttt{t}^{\tau}}} & \texttt{\texttt{\texttt{t}^{\tau}}} & \texttt{\texttt{\texttt{t}^{\tau}}} & \texttt{\texttt{\texttt{t}^{\tau}}} \\\hline \end{array}$$

Each formula ϕ of MF can be mapped to a classical formula ϕ (where **I**, -, ⊗, are replaced resp. by ,→,∧,∨). Hence ϕ holds whenever ϕ

The Dialectica interpretation of MF is the usual one rewritten with the connectives of MF, but for the disjunction that we treat similarly as ⊗. To each

$$\begin{array}{rclcrcl} (\varphi \otimes \psi)^{D}(a) &:=& \exists \langle u,v \rangle \forall \langle x,y \rangle. (\varphi \otimes \psi)\_{D}(\langle u,v \rangle, \langle x,y \rangle, a) &:=\\ (\varphi \otimes \psi)^{D}(a) &:=& \exists \langle u,v \rangle \forall \langle x,y \rangle. (\varphi \otimes \psi)\_{D}(\langle u,v \rangle, \langle x,y \rangle, a) &:=\\ (\varphi \circ \psi)^{D}(a) &:=& \exists \langle f,F \rangle \forall \langle u,y \rangle. (\varphi \multimap \psi)\_{D}(\langle f,F \rangle, \langle u,y \rangle, a) &:=\\ (\varphi \multimap \psi)^{D}(a) &:=& \exists \langle f,F \rangle \forall \langle u,y \rangle. (\varphi \multimap \psi)\_{D}(\langle f,F \rangle, \langle u,y \rangle, a) &:=\\ & & & \exists \langle f,F \rangle \forall \langle u,y \rangle. \varphi\_{D}(u,\langle F \rangle u,y \rangle, a) &:=\\ (\exists w. \varphi)^{D}(a) &:=& \exists \langle u,w \rangle \forall x. \left(\exists w. \varphi \big)\_{D}(\langle u,w \rangle, x,a\rangle\right) &:= & \exists \langle u,w \rangle \forall x. \, \varphi\_{D}(u,x,\langle a,w \rangle)\\ (\forall w. \varphi)^{D}(a) &:=& \exists f \forall \langle x,w \rangle. \left(\forall w. \varphi\right)\_{D}(f,\langle x,w \rangle, a) &:= & \exists f \forall \langle x,w \rangle. \, \varphi\_{D}(\langle f \rangle w, x,\langle a,w \rangle)\\ \end{array}$$

**Fig. 3.** The Dialectica Interpretation of MF (where types are leaved implicit).

formula ϕ(a) with only a free, we associate a formula ϕ<sup>D</sup>(a) with only a free, as well as a formula ϕ<sup>D</sup> with possibly other free variables. For atomic formulae we let ϕ<sup>D</sup>(a) := ϕD(a) := ϕ(a). The inductive cases are given on Fig. 3, where <sup>ϕ</sup><sup>D</sup>(a)=(∃u)(∀x)ϕD(u, x, a) and <sup>ψ</sup><sup>D</sup>(a)=(∃v)(∀y)ψD(v, y, a).

Dialectica is such that ϕ<sup>D</sup> is equivalent to ϕ via possibly non-intuitionistic but constructive principles. The tricky connectives are implication and universal quantification. Similarly as in the intuitionistic case (see e.g. [1,16,33]), (ϕ ψ)<sup>D</sup> is prenex a form of ϕ<sup>D</sup> ψ<sup>D</sup> obtained using (LAC) together with linear variants of the *Markov* and *Independence of premises* principles. In our case, the equivalence also requires additional axioms for ⊗ and . We give details for the full system in Sect. 3.3.

The soundness of (−)<sup>D</sup> goes as usual, excepted that we extract *open eager* terms: from a proof of <sup>ϕ</sup>(a<sup>κ</sup>) we extract a realizer of (∀a)ϕ(a), that is an open eager term u(x, a) s.t. ϕD(@(u(x, a), a), x, a). Composition of realizers (in part. required for the cut rule) is given by the fixpoints of Proposition 3. Note that a realizer of a closed ϕ is a finite-state winning P-strategy in G(ϕD)(u, x).

#### **3.2 Polarized Exponentials**

It is well-known that the structure of Dialectica is linear, as it makes problematic the interpretation of contraction:

$$
\varphi(a) \quad \bigcirc \quad \varphi(a) \otimes \varphi(a) \qquad \text{and} \qquad \varphi(a) \, \mathfrak{F}\varphi(a) \quad \bigcirc \quad \varphi(a)
$$

In our case, the B¨uchi-Landweber Theorem implies that all closed instances of contraction have realizers which are correct in the standard model. But this is in general not true for open instances.

*Example 3.* Realizers of ϕ ϕ ⊗ ϕ for a closed ϕ are given by eager terms U1(u, x1, x2), U2(u, x1, x2), X(u, x1, x2) which must represent P-strategies in the game G(Φ)(U1, U2, X,u, x1, x<sup>2</sup>), where Φ is

$$\lfloor \varphi\_D(u,(X)ux\_1x\_2) \rfloor \quad \rangle \quad \longrightarrow \quad \lfloor \varphi\_D((U\_1)u,x\_1) \rfloor \quad \land \quad \lfloor \varphi\_D((U\_2)u,x\_2) \rfloor$$

By the B¨uchi-Landweber Theorem 1, either there is an eager term U(x) such that ϕD(U(x), x) holds, so that

$$\left[\left[\varphi\_D(u, x\_1)\right]\right] \quad \longrightarrow \quad \left[\varphi\_D(\mathbf{e}(\mathsf{U})(x\_1), x\_1)\right] \quad \wedge \quad \left[\varphi\_D(\mathbf{e}(\mathsf{U})(x\_2), x\_2)\right]$$

or there is an eager term X(u) such that ¬ϕD(u, **e**(X)(u)) holds, so that

$$\left[\varphi\_D(u, \mathbf{e}(\mathbf{X})(u))\right] \quad \longrightarrow \quad \left[\varphi\_D(u, x\_1)\right] \quad \wedge \quad \left[\varphi\_D(u, x\_2)\right].$$

*Example 4.* Consider the open formula <sup>ϕ</sup>(a<sup>o</sup>) := (∀x<sup>o</sup>)(t(x, a) . = 0<sup>ω</sup>) where <sup>t</sup>(B,C)=0<sup>n</sup>+11<sup>ω</sup> for the first <sup>n</sup> <sup>∈</sup> <sup>N</sup> with <sup>C</sup>(n+1) = <sup>B</sup>(0) if such <sup>n</sup> exists, and such that <sup>t</sup>(B,C)=0<sup>ω</sup> otherwise. The game induced by ((∀a)(<sup>ϕ</sup> ϕ ⊗ ϕ))<sup>D</sup> is G(Φ)(X,x1, x1, a), where Φ is

$$\mathfrak{tr}((X)x\_1x\_2a,a) \doteq 0^{\omega} \quad \longrightarrow \quad \mathfrak{t}(x\_1,a) \doteq 0^{\omega} \quad \wedge \quad \mathfrak{t}(x\_2,a) \doteq 0^{\omega}$$

In this game, <sup>P</sup> begins by playing a function **<sup>2</sup>**<sup>3</sup> <sup>→</sup> **<sup>2</sup>**, <sup>O</sup> replies in **<sup>2</sup>**<sup>3</sup>, and then P and O keep on alternatively playing moves of the expected type. A finite-state winning strategy for O is easy to find. Let P begin with the function X. Fix some a ∈ **2** and let i := X(0, 1, a). O replies (0, 1, a) to X. The further moves of P are irrelevant, and O keeps on playing (−, −, 1 − i) (the values of x<sup>1</sup> and x<sup>2</sup> are irrelevant after the first round). This strategy ensures

$$\mathfrak{tr}((X)x\_1x\_2a,a) \doteq 0^{\omega} \quad \land \quad \neg(\mathfrak{t}(x\_1,a) \doteq 0^{\omega} \quad \land \quad \mathfrak{t}(x\_2,a) \doteq 0^{\omega}).$$

Hence we can not realize contraction while remaining correct w.r.t. the standard model. On the other hand, Dialectica induces polarities generalizing the usual polarities of linear logic (see e.g. [17]). Say that ϕ(a) is *positive* (resp. *negative*) if <sup>ϕ</sup><sup>D</sup>(a) is of the form <sup>ϕ</sup><sup>D</sup>(a)=(∃u<sup>τ</sup> )ϕD(u, <sup>−</sup>, a) (resp. <sup>ϕ</sup><sup>D</sup>(a)=(∀x<sup>σ</sup>)ϕD(−, x, a)). Quantifier-free formulae are thus both positive and negative.

*Example 5.* Polarized contraction

gives realizers of all instances of itself. Indeed, with say <sup>ϕ</sup><sup>D</sup>(a)=(∃u)ϕD(u, <sup>−</sup>, a) and <sup>ψ</sup><sup>D</sup>(a)=(∀y)ψD(−, y, a), *<sup>Λ</sup>*(π1) (for <sup>π</sup><sup>1</sup> <sup>a</sup> <sup>M</sup>-projection on suitable types) gives eager terms U(u, a) and Y(y, a) such that

$$\begin{array}{ccccc} & \varphi\_D(u,-,a) & \longrightarrow & \left(\varphi\_D\left(\mathbf{e}(\mathsf{U})(u,a),-,a\right) \quad \otimes & \varphi\_D\left(\mathbf{e}(\mathsf{U})(u,a),-,a\right)\right) \\ \text{and} & \left(\psi\_D\left(-,\mathbf{e}(\mathsf{Y})(y,a),a\right) \quad \mathbb{R} & \psi\_D\left(-,\mathbf{e}(\mathsf{Y})(y,a),a\right)\right) & \longrightarrow & \psi\_D(-,y,a) \end{array}$$

We only have exponentials for polarized formulae. First, following the usual polarities of linear logic, we can let

$$\begin{array}{rcl} (!(\varphi^{+}))^D (a) & := & (\exists u)(!(\varphi^{+}))\_D (u, -, a) & := & (\exists u)! \varphi\_D (u, -, a) \\ (?(\psi^{-}))^D (a) & := & (\forall y)(?(\psi^{-}))\_D (-, y, a) & := & (\forall x)? \psi\_D (-, y, a) \end{array} \tag{8}$$

$$\begin{array}{llll} \frac{\overline{\psi}\vdash\overline{\psi}'}{\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}'} & \frac{\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\!\!/\varphi\vdash\overline{\psi}'} & \frac{\overline{\varphi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\varphi}'}{\overline{\varphi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\varphi}'} & \frac{\left[\begin{matrix}\overline{\varphi}\vdash\varphi\mathrel{\mathop{:}}\overline{\psi}\right]}{\overline{\varphi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\overline{\psi}} \\ \overline{\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\varphi}'} & \frac{\overline{\varphi}\mathrel{\mathop{:}}\!\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\mathop{:}}\!\!/\varphi\vdash\overline{\psi}\mathrel{\$$

**Fig. 4.** Exponential rules of PF.

Hence !ϕ is positive for a positive ϕ and ?ψ is negative for a negative ψ. The following exponential contraction axioms are then interpreted by themselves:

$$!(\varphi^{+})\quad\cdots\quad!(\varphi^{+})\otimes!(\varphi^{+})\qquad\text{and}\qquad?(\psi^{-})\,\Re ? (\psi^{-})\quad\cdots\quad?(\psi^{-})$$

Second, we can have exponentials !(ψ−) and ?(ϕ<sup>+</sup>) with the automata-based reading of [26]. Positive formulae are seen as non-deterministic automata, and ?(−) on positive formulae is determinization on ω-words (McNaughton's Theorem [19]). Negative formulae are seen as universal automata, and !(−) on negative formulae is co-determinization (an instance of the *Simulation Theorem* [10,20]). Formulae which are both positive and negative (notation (−)±) correspond to deterministic automata, and are called *deterministic*. We let

$$\begin{array}{rcl} (!(\psi^{-}))^D(a) & := & (!(\psi^{-}))\_D(-,-,a) & := & !(\forall x)\psi\_D(-,x,a) \\ (?(\varphi^{+}))^D(a) & := & (?(\varphi^{+}))\_D(-,-,a) & := & ?(\exists u)\varphi\_D(u,-,a) \end{array} \tag{9}$$

So !(ψ−) and ?(ϕ<sup>+</sup>) are always deterministic. The corresponding exponential contraction axioms are interpreted by themselves. This leads to the following polarized fragment PF (the deduction rules for exponentials are given on Fig. 4):

$$\begin{array}{lcl} \varphi^{\pm}, \psi^{\pm} & ::= & \mathbf{I} \mid \perp \mid \alpha \mid ! (\varphi^{-}) \mid ? (\varphi^{+}) \mid \varphi^{\pm} \otimes \psi^{\pm} \mid \varphi^{\pm} \, \mathfrak{F} \psi^{\pm} \mid \varphi^{\pm} \multimap \psi^{\pm} \\ \varphi^{+}, \psi^{+} & ::= & \varphi^{\pm} \mid \, ! (\varphi^{+}) \mid (\exists x^{\sigma}) \varphi^{+} \mid \varphi^{+} \otimes \psi^{+} \mid \varphi^{+} \, \mathfrak{F} \psi^{+} \mid \varphi^{-} \multimap \psi^{+} \\ \varphi^{-}, \psi^{-} & ::= & \varphi^{\pm} \mid \, ? (\varphi^{-}) \mid (\forall x^{\sigma}) \varphi^{-} \mid \varphi^{-} \otimes \psi^{-} \mid \varphi^{-} \multimap \varphi^{-} \mid \varphi^{+} \multimap \psi^{-} \end{array}$$

#### **3.3 The Full System**

The formulae of the full system FS are given by the following grammar:

$$\varphi, \psi \quad ::= \begin{array}{c} \varphi^+ \mid \varphi^- \mid \varphi \multimap \psi \mid \varphi \otimes \psi \mid \varphi \multimap \theta \psi \mid (\exists x^\tau) \varphi \mid (\forall x^\tau) \varphi \end{array}$$

Deduction in FS is given by Figs. 2, 4 and (7). We extend − to FS with !ϕ := ?ϕ := ϕ. Hence ϕ holds when ϕ is derivable. The Dialectica interpretation of FS is given by Fig. 3 and (8), (9) (still taking ϕ<sup>D</sup>(a) := ϕD(a) := ϕ(a) for atoms). Note that (−)<sup>D</sup> preserves and reflects polarities.

**Theorem 2 (Soundness).** *Let* <sup>ϕ</sup> *be closed with* <sup>ϕ</sup><sup>D</sup> = (∃u<sup>τ</sup> )(∀x<sup>σ</sup>)ϕD(u, x)*. From a proof of* ϕ *in* FS *one can extract an eager term* u(x) *such that* FS *proves* (∀x<sup>σ</sup>)ϕD(u(x), x)*.*

As usual, proving requires extra axioms. Besides (LAC), we use the following (*linear* ) *semi-intuitionistic principles* (LSIP), with polarities as shown:

$$\begin{array}{ccccc} (\forall a)(\varphi^{-}(a)\otimes\psi^{-}) & \longrightarrow & (\forall a)\varphi^{-}(a)\otimes\psi^{-}\\ (\forall a)(\varphi^{-}(a)\,\,\,\,\,\forall\,\psi^{-}) & \longrightarrow & (\forall a)\varphi^{-}(a)\,\,\,\,\,\forall\,\psi^{-}\\ (\exists a)\varphi^{-}(a)\,\,\,\,\,\psi^{-} & \longrightarrow & (\exists a)(\varphi^{-}(a)\,\,\,\,\,\forall\,\psi) \\ (\psi^{-}-\circ\\ (\forall a)\varphi^{\pm}(a)\,\,\,\,\,\,\psi^{\pm}) & \longrightarrow & (\exists a)(\psi^{-}-\circ\varphi^{-}(a))\\ (\forall a)\varphi^{\pm}(a)\,\,\,\,\,\,\psi^{\pm}) & \longrightarrow & (\exists a)(\varphi^{\pm}(a)\,\,\,\,\,\,\circ\,\psi^{\pm}) \end{array} \tag{\mathsf{LSl}\mathcal{P}})$$

as well as the following *deterministic exponential* axioms (DEXP):

δ −- !δ and ?δ −δ (δ deterministic)

All these axioms but (LAC) are true in the standard model (via −). Moreover:

**Proposition 4.** *The axioms* (LAC) *and* (LSIP) *are realized in* FS*. The axioms* (DEXP) *are realized in* FS + (DEXP)*.*

**Theorem 3 (Characterization).** *We have*

$$\begin{array}{ccccc} \mathsf{FFS} + (\mathsf{LAC}) + (\mathsf{LSP}) + (\mathsf{DEXP}) & \varphi(a) & \multimap & \varphi^{D}(a) & \multimap\\ \mathsf{FFS} + (\mathsf{LSP}) + (\mathsf{DEXP}) & \varphi(a) & \multimap & \varphi^{D}(a) & \multimap \end{array}$$

**Corollary 1 (Extraction).** *Consider a closed formula* <sup>ϕ</sup> := (∀x<sup>σ</sup>)(∃u<sup>τ</sup> )δ(u, x) *with* δ *deterministic. From a proof of* ϕ *in* FS + (LAC)+(LSIP)+(DEXP) *one can extract a term* <sup>t</sup>(x) *such that* <sup>|</sup>= (∀x<sup>σ</sup>)δ(t(x), x)*.*

Note that FS + (DEXP) proves for all deterministic δ.

#### **3.4 Translations of Classical Logic**

There are many translations from classical to linear logic. Two canonical possibilities are the (−) <sup>T</sup> and (−) <sup>Q</sup>-translation of [9] (see also [17,18]) targeting resp. negative and positive formulae. Both take classical sequents to linear sequents of the form !(−) ?(−), which are provable in FS thanks to the PF rules

$$\frac{\overline{\varphi}, !\varphi \vdash \psi, ?\overline{\psi}}{\overline{\varphi} \vdash !\varphi \multimap \psi, ?\overline{\psi}} \qquad\qquad\qquad \frac{\overline{\varphi} \vdash \varphi, ?\overline{\psi}}{\overline{\varphi} \vdash (\forall z)\varphi, ?\overline{\psi}}$$

For the completeness of LMSO(C) (Theorem 6, Sect. 4), we shall actually require a translation (−)<sup>L</sup> such that the linear equivalences (with polarities as displayed)

$$?\varphi^+\circ\circ\circ[\varphi^+]^L\qquad\qquad\delta^\pm\circ\circ\circ[\delta^\pm]^L\qquad\qquad\qquad!\psi^-\circ\circ[\psi^-]^L\qquad\qquad\text{(10)}$$

are provable possibly with extra axioms that we require to realize themselves. In part., (10) implies (DEXP), and (−)<sup>L</sup> should give deterministic formulae. While (−) <sup>T</sup> and (−) <sup>Q</sup> can be adapted accordingly, (10) induces axioms which make the resulting translations equivalent to the deterministic (−)<sup>L</sup>-translation of [26]:

$$\begin{array}{rcl} \bot^L := \bot & \top^L := \mathbf{I} \quad \alpha^L := \alpha & (\varphi \lor \psi)^L := \varphi^L \,\,\,\forall \,\psi^L & (\exists x^\sigma.\varphi)^L := ?(\exists x^\sigma)\varphi^L\\ (\varphi \to \psi)^L := \varphi^L \multimap \psi^L & (\varphi \land \psi)^L := \varphi^L \otimes \psi^L & (\forall x^\sigma.\varphi)^L := !(\forall x^\sigma)\varphi^L \end{array}$$

**Proposition 5.** *The scheme (10) is equivalent in* FS *to* (DEXP)+(PEXP)*, where* (PEXP) *are the following* polarized exponential *axioms, with polarities as shown:*

**Proposition 6.** *If* ϕ *is provable in many-sorted classical logic with equality then* FS + (DEXP) *proves* ϕ<sup>L</sup>*.*

**Proposition 7.** *The axioms* (PEXP) *are realized in* FS + (LSIP)+(DEXP) + (PEXP)*. Corollary 1 thus extends to* FS + (LAC)+(LSIP)+(DEXP)+(PEXP)*.*

Note that <sup>ϕ</sup><sup>L</sup> is deterministic and that ϕ<sup>L</sup> <sup>=</sup> <sup>ϕ</sup>.

#### **4 Completeness**

In Sect. <sup>3</sup> we devised a Dialectica-like (−)<sup>D</sup> providing a syntactic extraction procedure for ∀∃(−)<sup>L</sup>-statements. In this Section, building on an axiomatic treatment of MSO(**M**), we show that LMSO, an arithmetic extension of FS+ (LSIP)+ (DEXP)+(PEXP) adapted from [26], is correct and complete w.r.t. Church's synthesis, in the sense that the provable ∀∃(−)<sup>L</sup>-statements are exactly the realizable ones. We then turn to the main result of this paper, namely the completeness of LMSO(C) := LMSO + (LAC). We fix the set of atomic formulae

$$\alpha \in \text{At} \quad ::= \begin{array}{c} \texttt{t}^{\tau} \doteq \texttt{u}^{\tau} \mid \texttt{t}^{o} \,\, \dot{\texttt{\underline{\dot{\tau}}}} \,\, \texttt{u}^{o} \mid \, \mathsf{E}(\texttt{t}^{o}) \mid \, \mathsf{N}(\texttt{t}^{o}) \mid \, \mathsf{S}(\texttt{t}^{o}, \texttt{u}^{o}) \mid \, \mathsf{0}(\texttt{t}^{o}) \mid \, \mathsf{t}^{o} \,\, \dot{\texttt{\underline{\dot{\tau}}}} \,\, \mathsf{u}^{o} \\\end{array}$$

#### **4.1 The Logic MSO(M)**

MSO(**M**) is many-sorted first-order logic with atomic formulae α ∈ At. Its sorts and terms are those given in Sect. 2, and standard interpretation extends that of Sect. <sup>2</sup> as follows: <sup>⊆</sup>˙ is set inclusion, <sup>E</sup> holds on <sup>B</sup> iff <sup>B</sup> is empty, <sup>N</sup> (resp. <sup>0</sup>) holds on B iff B is a singleton {n} (resp. the singleton {0}), and S(B,C) (resp. <sup>B</sup> <sup>≤</sup>˙ <sup>C</sup>) holds iff <sup>B</sup> <sup>=</sup> {n} and <sup>C</sup> <sup>=</sup> {<sup>n</sup> + 1} for some <sup>n</sup> <sup>∈</sup> <sup>N</sup> (resp. <sup>B</sup> <sup>=</sup> {n} and <sup>C</sup> <sup>=</sup> {m} for some <sup>n</sup> <sup>≤</sup> <sup>m</sup>). We write <sup>x</sup><sup>ι</sup> for variables <sup>x</sup><sup>o</sup> relativized to <sup>N</sup>, so that (∃x<sup>ι</sup> )<sup>ϕ</sup> and (∀x<sup>ι</sup> )<sup>ϕ</sup> stand resp. for (∃x<sup>o</sup>)(N(x) <sup>∧</sup> <sup>ϕ</sup>) and (∀x<sup>o</sup>)(N(x) <sup>→</sup> <sup>ϕ</sup>). Moreover, <sup>x</sup><sup>ι</sup> <sup>∈</sup>˙ <sup>t</sup> stands for <sup>x</sup><sup>ι</sup> <sup>⊆</sup>˙ <sup>t</sup>, so that <sup>t</sup><sup>o</sup> <sup>⊆</sup>˙ <sup>u</sup><sup>o</sup> is equivalent to (∀x<sup>ι</sup> )(<sup>x</sup> <sup>∈</sup>˙ <sup>t</sup> <sup>→</sup> <sup>x</sup> <sup>∈</sup>˙ <sup>u</sup>).

The logic MSO<sup>+</sup> [26] is MSO(**M**) restricted to the type o, hence with only terms for Mealy machines of sort (**2**,..., **2**; **2**). The MSO of [26] is the purely relational (term-free) restriction of MSO<sup>+</sup>. Recall from [26, Prop. 2.6], that for

A Dialectica-Like Interpretation of a Linear MSO on Infinite Words 483


**Fig. 5.** The Arithmetic Rules of MSO(**M**) and LMSO (with terms of sort *<sup>o</sup>* and *<sup>z</sup>* fresh).

each Mealy machine <sup>M</sup> : **<sup>2</sup>**<sup>p</sup> <sup>→</sup> **<sup>2</sup>**, there is an MSO-formula <sup>δ</sup>M(X,x) such that for all <sup>n</sup> <sup>∈</sup> <sup>N</sup> and all <sup>B</sup> <sup>∈</sup> (**2**<sup>ω</sup>)<sup>p</sup>, we have <sup>F</sup>M(B)(n) = 1 iff <sup>δ</sup>M({n}, <sup>B</sup>) holds.

The axioms of MSO(**M**) are the arithmetic rules of Fig. 5, the axioms (7) and the following, where <sup>M</sup> : **<sup>2</sup>**<sup>p</sup> <sup>→</sup> **<sup>2</sup>** and y, z,X are fresh.

$$\begin{array}{c} \overline{\vdash} (\forall \overline{X}^{o})(\forall x^{\iota}) \left(x \dot{\in} \mathsf{f}\_{\mathcal{M}}(\overline{X}) \leftrightarrow \delta\_{\mathcal{M}}(x, \overline{X})\right) \qquad \overline{\vdash} (\exists X^{o})(\forall x^{\iota}) \left(x \dot{\in} X \leftrightarrow \varphi\right)}\\ \overline{\varphi, \mathsf{0}(z) \vdash \varphi[z/x], \overline{\varphi}' \qquad \overline{\varphi}, \mathsf{S}(y, z), \varphi[y/x] \vdash \varphi[z/x], \overline{\varphi}'} \end{array}$$
 
$$\begin{array}{c} \overline{\varphi, \mathsf{0}(z) \vdash \varphi[z/x], \varphi[y/x] \vdash \varphi[z/x], \overline{\varphi}'}\\ \overline{\varphi \vdash (\forall x^{\iota}) \varphi, \overline{\varphi}'} \end{array}$$

The theory MSO(**M**) is complete. Thus provability in MSO(**M**) and validity in the standard model coincide. This extends [26, Thm. 2.11 (via [28])].

**Theorem 4 (Completeness of** MSO(**M**)**).** *For closed* MSO(**M**)*-formulae* ϕ*, we have* |= ϕ *if and only if* MSO(**M**) ϕ*.*

#### **4.2 The Logic LMSO**

The system LMSO is FS + (LSIP)+(DEXP)+(PEXP) extended with Fig. 5 and

Let LMSO(C) := LMSO + (LAC). Note that MSO(**M**) ϕ whenever LMSO ϕ. Proposition 6 extends so that similarly as in [26] we have

**Proposition 8.** *If* MSO(**M**) <sup>ϕ</sup> *then* LMSO <sup>ϕ</sup><sup>L</sup>*. In part., for a realizable instance of Church's synthesis* (∀x<sup>σ</sup>)(∃u<sup>τ</sup> )ϕ(u, x)*, we have* LMSO (∀x<sup>σ</sup>)(∃u<sup>τ</sup> )ϕ<sup>L</sup>(u, x)*.*

Moreover, the soundness of (−)<sup>D</sup> extends to LMSO. It follows that LMSO(C) is coherent and proves exactly the realizable ∀∃(−)<sup>L</sup>-statements.

**Theorem 5 (Soundness).** *Let* <sup>ϕ</sup> *be closed with* <sup>ϕ</sup><sup>D</sup> = (∃u<sup>τ</sup> )(∀xσ)ϕD(u, x)*. From a proof of* ϕ *in* LMSO(C) *one can extract an eager term* u(x) *such that* LMSO *proves* (∀xσ)ϕD(u(x), x)*.*

**Corollary 2 (Extraction).** *Consider a closed formula* <sup>ϕ</sup> := (∀xσ)(∃u<sup>τ</sup> ) δ(u, x) *with* δ *deterministic. From a proof of* ϕ *in* LMSO(C) *one can extract a term* <sup>t</sup>(x) *such that* <sup>|</sup>= (∀xσ)δ(t(x), x)*.*

#### **4.3 Completeness of LMSO(C)**

The completeness of LMSO(C) follows from a couple of important facts. First, LMSO(C) proves the elimination of linear double negation, using (via Theorem 3) the same trick as in [26].

**Lemma 1.** *For all* LMSO*-formula* ϕ*, we have* (ϕ - ⊥) -⊥ LMSO(C) ϕ*.*

Combining Lemma 1 with (LAC) gives classical linear choice.

**Corollary 3.** (∀f)(∃x)ϕ(x,(f)x) LMSO(C) (∃x)(∀y)ϕ(x, y)*.*

The key to the completeness of LMSO(C) is the following quantifier inversion.

**Lemma 2.** (∀x<sup>σ</sup>)ϕ(t<sup>τ</sup> (x), x) LMSO(C) (∃u<sup>τ</sup> )(∀x<sup>σ</sup>)ϕ(u, x)*, where* <sup>t</sup>(x) *is eager.*

Lemma 2 follows (via Corollary 3) from the fixpoints on eager machines (Proposition 3). Fix an eager t<sup>τ</sup> (x<sup>σ</sup>). Taking the fixpoint of -(f)t(x) : σ×-(σ)τ →EM σ-(σ)τ gives a term <sup>v</sup><sup>σ</sup>(f(σ)<sup>τ</sup> ) such that <sup>v</sup>(f) . = @(f, t(v(f))). Then conclude with

> (∀x<sup>σ</sup>)ϕ(t(x), x) LMSO <sup>ϕ</sup> - t(v(f)), v(f) LMSO ϕ - t(v(f)), @(f, t(v(f))) LMSO (∃u<sup>τ</sup> )<sup>ϕ</sup> - u,(f)u LMSO (∀f(σ)<sup>τ</sup> )(∃u<sup>τ</sup> )<sup>ϕ</sup> - u,(f)u LMSO(C) (∃u<sup>τ</sup> )(∀x<sup>σ</sup>)ϕ(u, x)

Completeness of LMSO(C) then follows via (−)<sup>D</sup>, Proposition 5, completeness of MSO(**M**) and B¨uchi-Landweber Theorem 1. The idea is to lift a f.s. winning <sup>P</sup>-strat. in <sup>G</sup>(ϕD(u, x))(u, x) to a realizer of <sup>ϕ</sup><sup>D</sup> = (∃u)(∀x)ϕD(u, x) in LMSO(C).

**Theorem 6 (Completeness of** LMSO(C)**).** *For each closed formula* ϕ*, either* LMSO(C) ϕ *or* LMSO(C) ϕ -⊥*.*

#### **5 Conclusion**

We provided a linear Dialectica-like interpretation of LMSO(C), a linear variant of MSO on ω-words based on [26]. Our interpretation is correct and complete w.r.t. Church's synthesis, in the sense that it proves exactly the realizable ∀∃(−)L-statements. We thus obtain a syntactic extraction procedure with correctness proof internalized in LMSO(C). The system LMSO(C) is moreover complete in the sense that for every closed formula ϕ, it proves either ϕ or its linear negation. While completeness for a linear logic necessarily collapse some linear structure, the corresponding axioms (DEXP) and (PEXP) do respect the structural constraints allowing for realizer extraction from proofs. The completeness of LMSO(C) contrasts with that of the classical system MSO(**M**), since the latter has provable unrealizable ∀∃-statements. In particular, proof search in LMSO(C) for ∀∃(−)<sup>L</sup>-formulae and their negation is correct and complete w.r.t. Church's synthesis. The design of the Dialectica interpretation also clarified the linear structure of LMSO, as it allowed us to decompose it starting from a system based on usual full intuitionistic linear logic (see e.g. [3] for recent references on the subject).

An outcome of witness extraction for LMSO(C) is the realization of a simple version of the fan rule (in the usual sense of e.g. [16]). We plan to investigate monotone variants of Dialectica for our setting. Thanks to the compactness of Σ<sup>ω</sup>, we expect this to allow extraction of uniform bounds, possibly with translations to stronger constructive logics than LMSO.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Deciding Equivalence of Separated Non-nested Attribute Systems in Polynomial Time**

Helmut Seidl<sup>1</sup>, Raphaela Palenta1(B), and Sebastian Maneth<sup>2</sup>

<sup>1</sup> Fakult¨at f¨ur Informatik, TU M¨unchen, Munich, Germany {seidl,palenta}@in.tum.de <sup>2</sup> FB3 - Informatik, Universit¨at Bremen, Bremen, Germany maneth@uni-bremen.de

**Abstract.** In 1982, Courcelle and Franchi-Zannettacci showed that the equivalence problem of separated non-nested attribute systems can be reduced to the equivalence problem of total deterministic separated basic macro tree transducers. They also gave a procedure for deciding equivalence of transducer in the latter class. Here, we reconsider this equivalence problem. We present a new alternative decision procedure and prove that it runs in polynomial time. We also consider extensions of this result to partial transducers and to the case where parameters of transducers accumulate strings instead of trees.

#### **1 Introduction**

Attribute grammars are a well-established formalism for realizing computations on syntax trees [20,21], and implementations are available for various programming languages, see, e.g. [12,28,29]. A fundamental question for any such specification formalism is whether two specifications are semantically equivalent. As a particular case, attribute grammars have been considered which compute uninterpreted trees. Such devices that translate input trees (viz. the parse trees of a context-free grammar) into output trees, have also been studied under the name "attributed tree transducer" [14] (see also [15]). In 1982, Courcelle and Franchi-Zannettacci showed that the equivalence problem for (strongly noncircular) attribute systems reduces to the equivalence problem for primitive recursive schemes with parameters [3]; the latter model is also known under the name *macro tree transducer* [9]. Whether or not equivalence of attributed tree transducers (ATTs) or of (deterministic) macro tree transducers (MTTs) is decidable remain two intriguing (and very difficult) open problems.

For several subclasses of ATTs it has been proven that equivalence is decidable. The most general and very recent result that covers almost all other known ones about deterministic tree transducers is that "deterministic top-down treeto-string transducers" have decidable equivalence [27]. Notice that the complexity of this problem remains unknown (the decidability is proved via two semidecision procedures). The only result concerning deterministic tree transducers

c The Author(s) 2019 that we are aware of and that is *not* covered by this general result, is the one by Courcelle and Franchi-Zannettacci about decidability of equivalence of "separated non-nested" ATTs (which they reduce to the same problem for "separated non-nested" MTTs). However, in their paper no statement is given concerning the complexity of the problem. In this paper we close this gap and study the complexity of deciding equivalence of separated non-nested MTTs. To do so we propose a new approach that we feel is simpler and easier to understand than the one of [3]. Using our approach we can prove that the problem can be solved in polynomial time.

**Fig. 1.** Input tree for 2101.01 (in ternary) and corresponding output tree of <sup>M</sup>tern.

In a separated non-nested attribute system, distinct sets of operators are used for the construction of inherited and synthesized attributes, respectively, and inherited attributes may depend on inherited attributes only. Courcelle and Franchi-Zannettacci's algorithm first translates separated non-nested attribute grammars into separated total deterministic non-nested macro tree transducers. In the sequel we will use the more established term *basic* macro-tree transducers instead of non-nested MTTs. Here, a macro tree transducer is called *separated* if the alphabets used for the construction of parameter values and outside of parameter positions are disjoint. And the MTT is *basic* if there is no nesting of state calls, i.e., there are no state calls inside of parameter positions. Let us consider an example. We want to translate ternary numbers into expressions over +, ∗, EXP, plus the constants 0, 1, and 2. Additionally, operators s, p, and z are used to represent integers in unary. The ternary numbers are parsed into particular binary trees; e.g., the left of Fig. 1 shows the binary tree for the

$$\begin{array}{llll} q\_0(g(x\_1, x\_2)) & \to & +(q(x\_1, z), q'(x\_2, p(z))) \\ q(f(x\_1, x\_2), y) & \to & +(r(x\_2, y), q(x\_1, s(y))) \\ q'(f(x\_1, x\_2), y) & \to & +(r(x\_1, y), q'(x\_2, p(y))) \\ \phi(i, y) & \to & \*(i, \text{EXP}(3, y)) \quad \text{for } i \in \{0, 1, 2\}, \phi \in \{q, q', r\} \end{array}$$

**Fig. 2.** Rules of the transducer <sup>M</sup>tern.

number 2101.02. This tree is translated by our MTT into the tree in the right of Fig. 1 (which indeed evaluates to 64.2 in decimal). The rules of our transducer Mtern are shown in Fig. 2. The example is similar to the one used by Knuth [20] in order to introduce attribute grammars. The transducer is indeed basic and separated: the operators p, s, and z are only used in parameter positions.

Our polynomial time decision procedure works in two phases: first, the transducer is converted into an "earliest" normal form. In this form, output symbols that are not produced within parameter positions are produced as early as possible. In particular it means that the root output symbols of the right-hand sides of rules for one state must differ. For instance, our transducer Mtern is *not* earliest, because all three r-rules produce the same output root symbol (∗). Intuitively, this symbol should be produced earlier, e.g., at the place when the state r is called. The earliest form is a common technique used for normal forms and equivalence testing of different kinds of tree transducers [8,13,22]. We show that equivalent states of a transducer in this earliest form produce their state-output exactly in the same way. This means especially that the output of parameters is produced in the same places. It is therefore left to check, in the second phase, that also these parameter outputs are equivalent. To this end, we build an equivalence relation on states of earliest transducers that combines the two equivalence tests described before. Technically speaking, the equivalence relation is tested by constructing sets of Herbrand equalities. From these equalities, a fixed point algorithm can, after polynomially many iterations, produce a stable set of equalities.

The proofs of Lemmata 1 and 2 can be found in the appendix of an extended version at http://arxiv.org/abs/1902.03858.

#### **2 Separated Basic Macro Tree Transducers**

Let Σ be a ranked alphabet, i.e., every symbol of the finite set Σ has associated with it a fixed rank <sup>k</sup> <sup>∈</sup> <sup>N</sup>. Generally, we assume that the input alphabet <sup>Σ</sup> is *non-trivial*, i.e., Σ has cardinality at least 2, and contains at least one symbol of rank 0 and at least one symbol of rank > 0. The set T<sup>Σ</sup> is the set of all (finite, ordered, rooted) trees over the alphabet Σ. We denote a tree as a string over Σ and parenthesis and commas, i.e., f(a, f(a, b)) is a tree over Σ, where f is of rank 2 and a, b are of rank zero. We use Dewey dotted decimal notation to refer to a node of a tree: The root node is denoted ε, and for a node u, its i-th child is denoted by u.i. For instance, in the tree f(a, f(a, b)) the b-node is at position 2.2. A *pattern* (or k-pattern) (over Δ) is a tree p ∈ T<sup>Δ</sup>∪{} over a ranked alphabet Δ and a disjoint symbol (with exactly k occurrences of the symbol ). The occurrences of the dedicated symbol serve as place holders for other patterns. Assume that p is a k-pattern and that p1,...,p<sup>k</sup> are patterns; then p[p1,...,pk] denotes the pattern obtained from p by replacing, for i = 1,...,k, the i-th occurrence (from left-to-right) of by the pattern pi.

A *macro tree transducer* (*MTT*) M is a tuple (Q, Σ, Δ, δ) where Q is a ranked alphabet of states, Σ and Δ are the ranked input and output alphabets, respectively, and δ is a finite set of rules of the form:

$$q(f(x\_1, \ldots, x\_k), y\_1, \ldots, y\_l) \to T \tag{1}$$

where q ∈ Q is a state of rank l + 1, l ≥ 0, f ∈ Σ is an input symbol of rank k ≥ 0, x1,...,x<sup>k</sup> and y1,...,y<sup>l</sup> are the formal input and output parameters, respectively, and T is a tree built up according to the following grammar:

$$T ::= a(T\_1, \ldots, T\_m) \mid q'(x\_i, T\_1, \ldots, T\_n) \mid y\_j \rangle$$

for output symbols a ∈ Δ of rank m ≥ 0 and states q ∈ Q of rank n + 1, input parameter x<sup>i</sup> with 1 ≤ i ≤ k, and output parameter y<sup>j</sup> with 1 ≤ j ≤ l. For simplicity, we assume that all states q have the same number l of parameters. Our definition of an MTT does not contain an initial state. We therefore consider an MTT always together with an axiom A = p[q1(x1, T1),...,qm(x1, Tm)] where <sup>T</sup>1,...,T<sup>m</sup> ∈ T <sup>l</sup> <sup>Δ</sup> are vectors of output trees (of length l each). Sometimes we only use an MTT M without explicitly mentioning an axiom A, then some A is assumed implicitly. Intuitively, the state q of an MTT corresponds to a function in a functional language which is defined through pattern matching over its first argument, and which constructs tree output using tree top-concatenation only; the second to (l+ 1)-th arguments of state q are its accumulating output parameters. The output produced by a state for a given input tree is determined by the right-hand side T of a rule of the transducer which matches the root symbol f of the current input tree. This right-hand side is built up from accumulating output parameters and calls to states for subtrees of the input and applications of output symbols from Δ. In general MTTs are nondeterministic and only partially defined. Here, however, we concentrate on total deterministic transducers. The MTT M is *deterministic*, if for every (q, f) ∈ Q × Σ there is at most one rule of the form (1). The MTT M is *total*, if for every (q, f) ∈ Q × Σ there is at least one rule of the form (1). For total deterministic transducers, the semantics of a state q ∈ Q with the rule q(f(x1,...,xk), y1,...,yl) → T can be considered as a function

$$[q]: \mathcal{T}\_{\Sigma} \times \mathcal{T}\_{\Delta}^{l} \to \mathcal{T}\_{\Delta}$$

which inductively is defined by:

$$\begin{aligned} \left[ \left[ q \right] \left( f(t\_1, \ldots, t\_k), \underline{S} \right) = \left[ T \right] \left( t\_1, \ldots, t\_k \right) \underline{S} \\ \text{where} \\ \left[ a(T\_1, \ldots, T\_m) \right] \underline{t} \underline{S} = a(\left[ T\_1 \right] \underline{t} \underline{S}, \ldots, \left[ T\_m \right] \underline{t} \underline{S}) \end{aligned}$$

$$\begin{aligned} \left[y\_j\right]\underline{t}\underline{S} &= S\_j\\ \left[q'(x\_i, T\_1, \dots, T\_l)\right]\underline{t}\underline{S} &= \left[q'\right](t\_i, \left[T\_1\right]\underline{t}\underline{S}, \dots, \left[T\_l\right]\underline{t}\underline{S}) \end{aligned}$$

where <sup>S</sup> = (S1,...,Sl) ∈ T <sup>l</sup> <sup>Δ</sup> is a vector of output trees. The semantics of a pair (M,A) with MTT M and axiom A = p[q1(x1, T1),...,qm(x1, Tm)] is defined by -(M,A)(t) = p[q1(t, T1),..., qm(t, Tm)]. Two pairs (M1, A1), (M2, A2) consisting of MTTs M1, M<sup>2</sup> and corresponding axioms A1, A<sup>2</sup> are *equivalent*, (M1, A1) ≡ (M2, A2), iff for all input trees t ∈ TΣ, and parameter values T ∈ T l <sup>Δ</sup>*in* , -(M1, A1)(t, T) = -(M2, A2)(t, T).

The MTT M is *basic*, if each argument tree T<sup>j</sup> of a subtree q (xi, T1,...,Tn) of right-hand sides T of rules (1) may not contain further occurrences of states, i.e., is in T<sup>Δ</sup>∪<sup>Y</sup> . The MTT M is *separated basic*, if M is basic, and Δ is the disjoint union of ranked alphabets Δout and Δin so that the argument trees T<sup>j</sup> of subtrees q (xi, T1,...,Tn) are in T<sup>Δ</sup>*in*∪<sup>Y</sup> , while the output symbols a outside of such subtrees are from Δout. The same must hold for the axiom. Thus, letters directly produced by a state call are in Δout while letters produced in the parameters are in Δin. The MTT Mtern from the Introduction is separated basic with Δout = {0, 1, 2, 3, ∗, +,EXP} and Δin = {p, s, z}.

As separated basic MTTs are in the focus of our interests, we make the grammar for their right-hand side trees T explicit:

$$\begin{array}{lcl} T & ::= a(T\_1, \ldots, T\_m) \mid y\_j \mid q'(x\_i, T'\_1, \ldots, T'\_n) \\ T' & ::= b(T'\_1, \ldots, T'\_{m'}) \mid y\_j \end{array}$$

where a ∈ Δout, q ∈ Q, b ∈ Δin of ranks m, n + 1 and m , respectively, and p is an n-pattern over Δ. For separated basic MTTs only axioms <sup>A</sup> <sup>=</sup> <sup>p</sup>[q1(x1, T1),...,qm(x1, Tm)] with <sup>T</sup>1,...,T<sup>m</sup> ∈ T <sup>l</sup> <sup>Δ</sup>*in* are considered.

Note that equivalence of nondeterministic transducers is undecidable (even already for very small subclasses of transductions [18]). Therefore, we assume for the rest of the paper that all MTTs are deterministic and separated basic. We will also assume that all MTTs are total, with the exception of Sect. 5 where we also consider partial MTTs.

*Example 1.* We reconsider the example from the Introduction and adjust it to our formal definition. The transducer was given without an axiom (but with a tacitly assumed "start state" q0). Let us now remove the state q<sup>0</sup> and add the axiom A = q(x1, z). The new q rule for g is:

$$q(g(x\_1, x\_2), y) \rightarrow +(q(x\_1, y), q'(x\_2, p(y))).$$

To make the transducer total, we add for state q the rule

$$q'(g(x\_1, x\_2), y) \rightarrow + (\*(0, \text{EXP}(3, y)), \*(0, \text{EXP}(3, y))).$$

For state r we add rules q(α(x1, x2), y) → ∗(0,EXP(3, y)) with α = f,g. The MTT is separated basic with Δout = {0, 1, 2, 3, ∗, +,EXP} and Δin = {p, s, z}. 

We restricted ourselves to *total* separated basic MTTs. However, we would like to be able to decide equivalence for *partial* transducers as well. For this reason we define now top-down tree automata, and will then decide equivalence of MTTs relative to some given DTA D. A *deterministic top-down tree automaton* (*DTA*) D is a tuple (B,Σ, b0, δD) where B is a finite set of states, Σ is a ranked alphabet of input symbols, b<sup>0</sup> ∈ B is the initial state, and δ<sup>D</sup> is the partial transition function with rules of the form b(f(x1,...,xk)) → (b1(x1),...,bk(xk)), where b, b1,...,b<sup>k</sup> ∈ B and f ∈ Σ of rank k. W.l.o.g. we always assume that all states b of a DTA are productive, i.e., dom(b) = ∅. If we consider a MTT M relative to a DTA D we implicitly assume a mapping π : Q → B, that maps each state of M to a state of D, then we consider for q only input trees in dom(π(q)).

### **3 Top-Down Normalization of Transducers**

In this section we show that each total deterministic basic separated MTT can be put into an "earliest" normal form relative to a fixed DTA D. Intuitively, state output (in Δout) is produced as early as possible for a transducer in the normal form. It can then be shown that two equivalent transducers in normal form produce their state output in exactly the same way.

Recall the definition of patterns as trees over T<sup>Δ</sup>∪{}. Substitution of symbols by other patterns induces a partial ordering over patterns by p p if and only if p = p [p1,...,pm] for some patterns p1,...,pm. W.r.t. this ordering, is the *largest* element, while all patterns without occurrences of are minimal. By adding an artificial *least* element ⊥, the resulting partial ordering is in fact a *complete lattice*. Let us denote this complete lattice by PΔ.

Let Δ = Δin ∪ Δout. For T ∈ T<sup>Δ</sup>∪<sup>Y</sup> , we define the Δout*-prefix* as the pattern p ∈ T<sup>Δ</sup>*out*∪{} as follows. Assume that T = a(T1,...,Tm).


By this definition, each tree t ∈ T<sup>Δ</sup>∪<sup>Y</sup> can be uniquely decomposed into a Δoutprefix p and subtrees t1,...,t<sup>m</sup> whose root symbols all are contained in Δin ∪Y such that t = p[t1,...,tm].

Let M be a total separated basic MTT M, D be a given DTA. We define the Δout-prefix of a state q of M relative to D as the minimal pattern p ∈ T<sup>Δ</sup>*out*∪{} so that each tree <sup>q</sup>(t, T), <sup>t</sup> <sup>∈</sup> dom(π(q)), T ∈ T <sup>l</sup> <sup>Δ</sup>, is of the form p[T1,...,Tm] for some sequence of subtrees T1,...,T<sup>m</sup> ∈ TΔ. Let us denote this unique pattern p by prefo(q). If q(f,y1,...,yl) → T is a rule of a separated basic MTT and there is an input tree f(t1,...,tk) ∈ dom(π(q)) then |prefo(q)|≤|T|.

**Lemma 1.** *Let* M *be a total separated basic MTT and* D *a given DTA. Let* t ∈ dom(π(q)) *be a smallest input tree of a state* q *of* M*. The* Δout*-prefix of every state* q *of* M *relative to* D *can be computed in time* O(|t|·|M|)*.*

The proof is similar to the one of [8, Theorem 8] for top-down tree transducers. This construction can be carried over as, for the computation of Δout-prefixes, the precise contents of the output parameters y<sup>j</sup> can be ignored.

*Example 2.* We compute the Δout-prefix of the MTT M from Example 1. We consider M relative to the trivial DTA D that consists only of one state b with dom(b) = TΣ. We therefore omit D in our example. We get the following system of in-equations: from the rules of state r we obtain Y<sup>r</sup> ∗(i,EXP(3, )) with i ∈ {0, 1, 2}. From the rules of state q we obtain Y<sup>q</sup> +(Yq, Y<sup>q</sup>- ), Y<sup>q</sup> +(Yr, Yq) and Y<sup>q</sup> ∗(i,EXP(3, )) with i ∈ {0, 1, 2}. From the rules of state q we obtain Y<sup>q</sup>- +(∗(0,EXP(3, )), ∗(0,EXP(3, ))), Y<sup>q</sup>- +(Yr, Y<sup>q</sup>- ) and Y<sup>q</sup>- ∗(i,EXP(3, )) with <sup>i</sup> ∈ {0, <sup>1</sup>, <sup>2</sup>}. For the fixpoint iteration we initialize <sup>Y</sup> (0) <sup>r</sup> , Y (0) <sup>q</sup> , Y (0) q with ⊥ each. Then Y (1) <sup>r</sup> <sup>=</sup> <sup>∗</sup>(,EXP(3, )) = <sup>Y</sup> (2) <sup>r</sup> and Y (1) <sup>q</sup> <sup>=</sup> , <sup>Y</sup> (1) q- = . Thus, the fixpoint iteration ends after two rounds with the solution prefo(q) = . 

Let M be a separated basic MTT M and D be a given DTA D. M is called D-earliest if for every state q ∈ Q the Δout-prefix with respect to π(q) is .

**Lemma 2.** *For every pair* (M,A) *consisting of a total separated basic MTT* M *and axiom* A *and a given DTA* D*, an equivalent pair* (M , A ) *can be constructed so that* M *is a total separated basic MTT that is* D*-earliest. Let* t *be an output tree of* (M,A) *for a smallest input tree* t ∈ dom(π(q)) *where* q *is the state occurring in* A*. Then the construction runs in time* O(|t|·|(M,A)|)*.*

The construction follows the same line as the one for the earliest form of top-down tree transducer, cf. [8, Theorem 11]. Note that for partial separated basic MTTs the size of the Δout-prefixes is at most exponential in the size of the transducer. However for total transducer that we consider here the Δout-prefixes are linear in the size of the transducer and can be computed in quadratic time, cf. [8].

**Corollary 1.** *For* (M,A) *consisting of a total deterministic separated basic MTT* M *and axiom* A *and the trivial DTA* D *accepting* T<sup>Σ</sup> *an equivalent pair* (M , A ) *can be constructed in quadratic time such that* M *is an* D*-earliest total deterministic separated basic MTT.*

*Example 3.* We construct an equivalent earliest MTT M for the transducer from Example 1. In Example 2 we already computed the Δout-prefixes of states q, q , r; prefo(q) = , prefo(q ) = and prefo(r) = ∗(,EXP(3, )). As there is only one occurrence of symbol in the Δout-prefixes of q and q we call states q, 1 and q , 1 by q and q , respectively. Hence, a corresponding earliest transducer has axiom A = q(x1, z). The rules of q and q for input symbol g do not change. For input symbol f we obtain

$$q(f(x\_1, x\_2), y) \to +(\*(r(x\_2, y), \text{EXP}(3, y)), q(x1, s(y))) \text{ and } q'(f(x\_1, x\_2), y) \to +(\*(r(x\_1, y), \text{EXP}(3, y), q'(x\_2, p(y))).$$

As there is only one occurrence of symbol related to a recursive call in prefo(r) we call r, 1 by r. For state r we obtain new rules r(α(x1, x2), y) → 0 with α ∈ {f,g} and r(i, y) → i with i ∈ {0, 1, 2}. 

We define a family of equivalence relation by induction, <sup>∼</sup>=<sup>b</sup> <sup>⊆</sup> ((Q, <sup>T</sup> <sup>k</sup> <sup>Δ</sup>*in* ) ∪ <sup>T</sup>Δ*in* ) <sup>×</sup> ((Q, <sup>T</sup> <sup>k</sup> <sup>Δ</sup>*in* ) ∪ TΔ*in* ) with b a state of a given DTA is the intersection of the equivalence relations ∼=(h) <sup>b</sup> , i.e., <sup>X</sup> <sup>∼</sup>=<sup>b</sup> <sup>Z</sup> if and only if for all <sup>h</sup> <sup>≥</sup> 0, <sup>X</sup> <sup>∼</sup>=(h) b Z. We let (q, T) ∼=(h+1) <sup>b</sup> (q , T ) if for all f ∈ dom(b) with b(f(x1,...,xk) → (b1,...,bk), there is a pattern p such that q(f(x1,...,xk), y) → p[t1,...,tm] and q (f(x1,...,xk), y ) → p[t 1,...,t <sup>m</sup>] with


We let T ∼=(h+1) <sup>b</sup> (q , T ) if for all f ∈ dom(b) with r(f(x1,...,xk)) → (b1,...,bk), q (f(x), y) → t ,


Intuitively, (q, T) ∼=<sup>h</sup> <sup>b</sup> (q , T ) if for all input trees t ∈ dom(b) of height h, q(t, T) = q (t, T ). Then (q, T) ∼=<sup>b</sup> (q , T ) if for *all* input trees t ∈ dom(b) (independent of the height), q(t, T) = q (t, T ).

**Theorem 1.** *For a given DTA* D *with initial state* b*, let* M,M *be* D*-earliest total deterministic separated basic MTTs with axioms* A *and* A *, respectively. Then* (M,A) *is equivalent to* (M , A ) *relative to* D*, iff there is a pattern* p *such that* A = p[q1(x1, T1),...,qm(x1, Tm)]*, and* A = p[q 1(x1, T 1),...,q m(x1, T <sup>m</sup>)] *and for* j = 1,...,m*,* (q<sup>j</sup> , T<sup>j</sup> ) ∼=<sup>b</sup> (q <sup>j</sup> , T <sup>j</sup> )*, i.e.,* q<sup>j</sup> *and* q <sup>j</sup> *are equivalent on the values of output parameters* T<sup>j</sup> *and* T j *.*

*Proof.* Let Δ be the output alphabet of M and M . Assume that (M,A) ∼=<sup>b</sup> (M , A ). As M and M are earliest, the Δout-prefix of -(M,A)(t) and -(M , A )(t), for t ∈ dom(b) is the same pattern p and therefore A = p[q1(x1, T1),...,qm(x1, Tm)] and A = p[q 1(x1, T 1),...,q m(x1, T <sup>m</sup>)]. To show that (qi, Ti) ∼=<sup>b</sup> (q i, T <sup>i</sup> ) let u<sup>i</sup> be the position of the i-th -node in the pattern p. For some t ∈ dom(b) and T ∈ T<sup>Δ</sup>*in* let t<sup>i</sup> and t <sup>i</sup> be the subtree of -(M,A)(t, T) and -(M , A )(t, T), respectively. Then t<sup>i</sup> = t <sup>i</sup> and therefore (qi, Ti) ∼=<sup>b</sup> (q i, T i ).

Now, assume that the axioms A = p[q1(x1, T1),...,qm(x1, Tm)] and A = p[q 1(x1, T 1),...,q m(x1, T <sup>m</sup>)] consist of the same pattern p and for i = 1,...,m, (qi, Ti) ∼=<sup>b</sup> (q i, T <sup>i</sup> ). Let t ∈ dom(b) be an input tree then

$$\begin{array}{l} \mathbb{I}(M,A)\mathbb{I}(t) = p[\mathbb{I}[q\_1](t,\underline{T\_1}),\ldots,\mathbb{I}[q\_m](t,\underline{T\_m})] \\ = p[\mathbb{I}[q'\_1](t,\underline{T'\_1}),\ldots,\mathbb{I}[q'\_m](t,\underline{T'\_m})] \\ = [\mathbb{I}(M',A')\overline{\mathbb{I}}(t). \end{array}$$

### **4 Polynomial Time**

In this section we prove the main result of this paper, namely, that for each fixed DTA D, equivalence of total deterministic basic separated MTTs (relative to D) can be decided in polynomial time. This is achieved by taking as input two D-earliest such transducers, and then collecting conditions on the parameters of pairs of states of the respective transducers for their produced outputs to be equal.

*Example 4.* Consider a DTA D with a single state only which accepts all inputs, and states q, q with

$$q(a, y\_1, y\_2) \rightarrow g(y\_1) \qquad q'(a, y\_1', y\_2') \rightarrow g(y\_2')$$

Then q and q can only produce identical outputs for the input a (in dom(b)) if parameter y <sup>2</sup> of q contains the same output tree as parameter y<sup>1</sup> of q. This precondition can be formalized by the equality y 2 . = y1. Note that in order to distinguish the output parameters of q from those of q we have used primed copies y <sup>i</sup> for q . 

It turns out that *conjunctions* of equalities such as in Example 4 are sufficient for proving equivalence of states. For states q, q of total separated basic MTTs M,M , respectively, that are both D-earliest for some fixed DTA D, h ≥ 0 and some fresh variable z, we define

$$\Psi\_{b,q}^{(h)}(z) = \bigwedge\_{b(f\underline{x}) \to (b\_1, \dots, b\_k)} \bigwedge\_{q(f\underline{x}, \underline{y}) \to y\_j \atop q(f\underline{x}, \underline{y}) \to \dot{q}(x\_i, \underline{\mathcal{T}})} (z \doteq y\_j) \qquad \land \newline \bigwedge\_{q(f\underline{x}, \underline{y}) \to \dot{q}(x\_i, \underline{\mathcal{T}})} \Psi\_{b\_i, \dot{q}}^{(h-1)}(z)[\underline{\mathcal{T}}/\underline{y}] \land \newline \bigwedge\_{q(f\underline{x}, \underline{y}) \to p[\dots]} \bot$$

where <sup>⊥</sup> is the boolean value *false*. We denote the output parameters in <sup>Ψ</sup>(h) b,q (z) by y, we define Ψ(h) b,q- (z) in the same lines as Ψ(h) b,q (z) but using y for the output parameters. To substitute the output parameters with trees T, T , we therefore use Ψ(h) b,q (z)[T/y] and <sup>Ψ</sup>(h) b,q- (z)[T /y ]. Assuming that q is a state of the D-earliest separated basic MTT M then Ψ(h) b,q (z) is true for all ground parameter values s and some T ∈ T<sup>Δ</sup>∪<sup>Y</sup> if q(t, s) = T[s/y] for all input trees t ∈ dom(b) of height at most h. Note that, since M is D-earliest, T is necessarily in T<sup>Δ</sup>*in*∪<sup>Y</sup> . W.l.o.g., we assume that every state b of D is productive, i.e., dom(b) = ∅. For each state b of D, we therefore may choose some input tree t<sup>b</sup> ∈ dom(b) of minimal depth. We define sb,q to be the output of q for a minimal input tree t<sup>r</sup> ∈ dom(b) and parameter values y—when considering formal output parameters as output symbols in Δin, i.e., sb,q = q(tr, y).

*Example 5.* We consider again the trivial DTA D with only one state b that accepts all t ∈ TΣ. Thus, we may choose t<sup>b</sup> = a. For a state q with the following two rules q(a, y1, y2) → y<sup>1</sup> and q(f(x), y1, y2) → q(x, h(y2), b), we have sb,q = y1. Moreover, we obtain

$$\begin{aligned} \Psi\_{b,q}^{(0)}(z) &= z \doteq\_{y\_1} \\ \Psi\_{b,q}^{(1)}(z) &= (z \doteq y\_1) \land (z \doteq h(y\_2)) \\ \Psi\_{b,q}^{(2)}(z) &= (z \doteq y\_1) \land (z \doteq h(y\_2)) \land (z \doteq h(b)) \\ &\equiv (y\_2 \doteq b) \land (y\_1 \doteq h(b)) \land (z \doteq h(b)) \\ \Psi\_{b,q}^{(3)}(z) &= (z \doteq y\_1) \land (b \doteq b) \land (h(y\_2) \doteq h(b)) \land (z \doteq h(b)) \\ &\equiv (y\_2 \doteq b) \land (y\_1 \doteq h(b)) \land (z \doteq h(b)) \end{aligned}$$

We observe that Ψ(2) b,q (z) = <sup>Ψ</sup>(3) b,q (z) and therefore for every <sup>h</sup> <sup>≥</sup> 2, <sup>Ψ</sup>(h) b,q (z) = Ψ(3) b,q (z). 

According to our equivalence relation ∼=b, b state of the DTA D, we define for states q, q of D-earliest total deterministic separated basic MTTs M,M , and <sup>h</sup> <sup>≥</sup> 0, the conjunction <sup>Φ</sup>(h) b,(q,q-) by

$$\bigwedge\_{\substack{b(f\underline{x})\rightarrow(b\_1,\ldots,b\_k)\\g(f\underline{x},\underline{y})\rightarrow p(\underline{\mathfrak{x}})\\\iota\_i'=\iota\_i'=\iota\_{j'}'}} \left(\bigwedge\_{\substack{t\_i=y\_{j\_i},\\\iota\_i'=\iota\_{j'}'}}(y\_{j\_i}\doteq y\_{j\_{i'}'}')\right)$$

$$\bigwedge\_{\iota\_i=\iota\_{j\_i}}^{\iota\_{\iota\_{j\_i}}} \Psi\_{b\_{j\_i'},q\_i'}^{\iota(h-1)}(y\_{j\_i})[\underline{T}'/\underline{y}'] \tag{4}$$

$$\bigwedge\_{\mathfrak{e}\_i'=\mathfrak{e}\_{j'}'} \bigwedge\_{\mathfrak{e}\_{j\_i}', \mathfrak{e}\_i} \Psi\_{b\_{j\_i}, q\_i}^{(h-1)}(y\_{j\_i'}') [\![\![\![\![\![\![\![\/.\!]\!]\!]\!]\!]\!] }\wedge$$

$$\bigwedge\_{t\_i = q\_i(x\_{j\_i}, \underline{T}), \\ \mathcal{L}}^{t\_i \wedge\_{j\_i}^{\prime} \vee} \Phi^{(h-1)}\_{b\_{j\_i}, (q\_i, q\_i^{\prime})} [\underline{T}/\underline{y}, \underline{T}^{\prime}/\underline{y}^{\prime}]$$

$$\begin{array}{ll} \iota'\_{i} = \iota'\_{i}(x\_{j'\_{i}}, \underline{T'})\\ \iota\_{j} = \iota'\_{i} \end{array}$$

$$\bigwedge\_{\begin{subarray}{c} t\_{i} = q\_{i}(x\_{j\_{i}}, \underline{T}),\\ t'\_{i} = q'\_{i}(x\_{j'\_{i}}, \underline{T'})\\ t'\_{i} = q'\_{i}(x\_{j'\_{i}}, \underline{T'})\\ \iota\_{j} \neq j'\_{i} \end{subarray}} \left(\varPsi^{(h-1)}\_{b\_{j\_{i}}, q\_{i}}(s\_{b, q\_{i}})[\![\![T/\!]\!]\!] \wedge \varPsi^{\prime(h-1)}\_{b\_{j'\_{i}}, q'\_{i}}(s\_{b, q\_{i}}[\![\![T/\!]\!]\!]\!)[\![\![T/\!]\!]\!]\right) \wedge \upleft\{\\ \begin{subarray}{c} t'\_{i} = q'\_{i}(x\_{j'\_{i}}, \underline{T'})\\ \iota\_{j} \neq j'\_{i} \end{subarray}} \left(\varPsi^{(h-1)}\_{b\_{j'\_{i}}, q'\_{i}}(s\_{b, q\_{i}}[\![\![T/\!]\!]\!]\!)\right) \right\} \wedge \up{I} \\ \rightarrow \bigsqcup\_{j} \end{array}$$

*b*(*f*)→(*b*1*,...,bk*) *p*=*p*-*,q*(*fx,y*)→*p*[*t*] *q*-(*fx,q*)→*p*-[*t*-]

*q*-(*fx,y*-

)→*p*[*t*-]

*t*-

*t*-

Φ(h) b,(q,q-) is defined in the same lines as the equivalence relation <sup>∼</sup>=(h) <sup>b</sup> . <sup>Φ</sup>(h) b,(q,q-) is true for all values of output parameters T, T such that q(t, T) = q (t, T ) for t ∈ dom(b) of height at most h. By induction on h ≥ 0, we obtain:

**Lemma 3.** *For a given DTA* D*, states* q, q *of* D*-earliest total separated basic MTTs, vectors of trees* T, T *over* Δin*,* b *a state of* D*.* s ∈ dom(b)*, and* h ≥ 0 *the following two statements hold:*

$$(q, \underline{T}) \cong\_{b}^{(h)} (q', \underline{T'}) \Leftrightarrow \Phi\_{b,(q,q')}^{(h)}[\underline{T}/\underline{y}, \underline{T'}/\underline{y'}] \equiv \text{true}$$

$$s \cong\_{b}^{(h)} (q', \underline{T'}) \Leftrightarrow \Psi\_{b,q'}^{(h)}(t)[\underline{T'}/\underline{y}] \equiv \text{true}$$

Φ(h) b,(q,q-) is a conjunction of equations of the form y<sup>i</sup> . = y<sup>j</sup> , y<sup>i</sup> . = t with t ∈ Δin. Every satisfiable conjunction of equalities is equivalent to a (possible empty) finite conjunction of equations of the form y<sup>i</sup> . = ti, t<sup>i</sup> ∈ T<sup>Δ</sup>*in*∪<sup>Y</sup> where the y<sup>i</sup> are distinct and no equation is of the form y<sup>j</sup> . = y<sup>j</sup> . We call such conjunctions *reduced*. If we have two inequivalent reduced conjunctions φ<sup>1</sup> and φ<sup>2</sup> with φ<sup>1</sup> ⇒ φ<sup>2</sup> then φ<sup>1</sup> contains strictly more equations. From that follows that for every sequence φ<sup>0</sup> ⇒ ...φ<sup>m</sup> of pairwise inequivalent reduced conjunctions φ<sup>j</sup> with k variables, m ≤ k + 1 holds. This observation is crucial for the termination of the fixpoint iteration we will use to compute Φ(h) b,(q,q-).

For h ≥ 0 we have:

$$
\Psi\_{b,q}^{(h)}(z) \Rightarrow \Psi\_{b,q}^{(h-1)}(z) \tag{2}
$$

$$
\Phi\_{b,(q,q')}^{(h)} \Rightarrow \Phi\_{b,(q,q')}^{(h-1)} \tag{3}
$$

As we fixed the number of output parameters to the number l, for each pair (q, q ) the conjunction Φ(h) b,(q,q-) contains at most 2l variables yi, y <sup>i</sup>. Assuming that the MTTs to which state q and q belong have n states each, we conclude that <sup>Φ</sup>(n2(2l+1)) b,(q,q-) <sup>≡</sup> <sup>Φ</sup>(n2(2l+1)+i) b,(q,q-) and <sup>Ψ</sup>(n(l+1)) b,q <sup>≡</sup> <sup>Ψ</sup>(n(l+1)+i) b,q for all i ≥ 0. Thus, we can define Φb,(q,q-) := <sup>Φ</sup>(n2(2l+1)) b,(q,q-) and <sup>Ψ</sup>b,q := <sup>Ψ</sup>(n(l+1)) b,q . As (q, T) ∼=<sup>b</sup> (q , T ) iff for all <sup>h</sup> <sup>≥</sup> 0, (q, T) <sup>∼</sup>=(h) <sup>b</sup> (q , T ) holds, observation (3) implies that

$$(q, \underline{T}) \cong\_b (q', \underline{T'}) \Leftrightarrow \Phi\_{b,(q,q')}[\underline{T}/\underline{y}][\underline{T'}/\underline{y'}] \equiv \text{true}$$

Therefore, we have:

**Lemma 4.** *For a DTA* D*, states* q, q *of* D*-earliest separated basic MTTs* M,M *and states* b *of* D*, the formula* Φb,(q,q-) *can be computed in time polynomial in the sizes of* M *and* M *.*

*Proof.* We successively compute the conjunctions Ψ(h) b,q (z), Ψ(h) b,q- (z), Φ(h) b,(q,q-), h ≥ 0, for all states b, q, q . As discussed before, some <sup>h</sup> <sup>≤</sup> <sup>n</sup><sup>2</sup>(2<sup>l</sup> + 1) exists such that the conjunctions for h + 1 are equivalent to the corresponding conjunctions for h—in which case, we terminate. It remains to prove that the conjunctions for h can be computed from the conjunctions for h − 1 in polynomial time. For that, it is crucial that we maintain *reduced* conjunctions. Nonetheless, the *sizes* of occurring right-hand sides of equalities may be quite large. Consider for example the conjunction x<sup>1</sup> . = a ∧ x<sup>2</sup> . = f(x1, x1) ∧ ... ∧ x<sup>n</sup> . = f(xn−<sup>1</sup>, xn−<sup>1</sup>). The corresponding reduced conjunction is then given by x<sup>1</sup> . = a∧x<sup>2</sup> . = f(a, a)∧...∧x<sup>n</sup> . = f(f(f(...(f(a, a))...) where the sizes of right-hand sides grow exponentially. In order to arrive at a polynomial-size representation, we therefore rely on compact representations where isomorphic subtrees are represented only once. W.r.t. this representation, reduction of a non-reduced conjunction, implications between reduced conjunctions as well as substitution of variables in conjunctions can all be realized in polynomial time. From that, the assertion of the lemma follows.

*Example 6.* Let D be a DTA with the following rules b(f(x)) → (b), b(g) → () and b(h) → (). Let q and q be states of separated basic MTTs M, M , respectively, that are D-earliest and π, π be the mappings from the states of D to the states of M, M with (b, q) ∈ π and (b, q ) ∈ π .

$$\begin{array}{c} q(f(x), y\_1, y\_2) \to a(q(x, b(y\_1, y\_1), c(y\_2), d)) \\ q(g, y\_1, y\_2) \to y\_1 \\ q(h, y\_1, y\_2) \to y\_2 \end{array}$$

$$\begin{array}{c} q'(f(x), y\_1', y\_2') \to a(q'(x, c(y\_1'), b(y\_2', y\_2'), d)) \\ q'(g, y\_1', y\_2') \to y\_2' \\ q'(h, y\_1', y\_2') \to y\_1' \end{array}$$

$$\begin{aligned} \Phi^{(0)}\_{r,(q,q')} &= (y\_1 \doteq y'\_2) \land (y\_2 \doteq y'\_1) \\ \Phi^{(1)}\_{r,(q,q')} &= (y\_1 \doteq y'\_2) \land (y\_2 \doteq y'\_1) \land (b(y\_1, y\_1) \doteq b(y'\_2, y'\_2)) \land (c(y\_2) \doteq c(y'\_1)) \\ &\equiv (y\_1 \doteq y'\_2) \land (y\_2 \doteq y'\_1) = \Phi^{(0)}\_{r,(q,q')} \end{aligned}$$

In summary, we obtain the main theorem of our paper.

**Theorem 2.** *Let* (M,A) *and* (M , A ) *be pairs consisting of total deterministic separated basic MTTs* M*,* M *and corresponding axioms* A*,* A *and* D *a DTA. Then the equivalence of* (M,A) *and* (M , A ) *relative to* D *is decidable. If* D *accepts all input trees, equivalence can be decided in polynomial time.*

*Proof.* By Lemma 2 we build pairs (M1, A1) and (M 1, A <sup>1</sup>) that are equivalent to (M,A) and (M , A ) where M1, M <sup>1</sup> are D-earliest separated basic MTTs. If D is trivial the construction is in polynomial time, cf. Corollary 1. Let the axioms be A<sup>1</sup> = p[q1(x<sup>i</sup><sup>1</sup> , T1),...,qk(x<sup>i</sup>*<sup>k</sup>* , Tk)] and A <sup>1</sup> = p [q 1(x<sup>i</sup>- <sup>1</sup> , T1),...,q k(x<sup>i</sup>- *k*- , T<sup>k</sup>- )]. According to Lemma 3 (M1, A1) and (M 1, A <sup>1</sup>) are equivalent iff

– p = p , k = k and – for all j = 1,...,k, Φb,(q*<sup>j</sup>* ,q- *<sup>j</sup>* )[Tj/y, Tj/y ] is equivalent to true.

By Lemma 4 we can decide the second statements in time polynomial in the sizes of M<sup>1</sup> and M 1.

# **5 Applications**

In this section we show several applications of our equivalence result. First, we consider partial transductions of separated basic MTTs. To decide the equivalence of partial transductions we need to decide (a) whether the domain of two given MTTs is the same and if so, (b) whether the transductions on this domain are the same. How the second part of the decision procedure is done was shown in detail in this paper if the domain is given by a DTA. It therefore remains to discuss how this DTA can be obtained. It was shown in [4, Theorem 3.1] that the domain of every top-down tree transducer T can be accepted by some DTA B<sup>T</sup> and this automaton can be constructed from T in exponential time. This construction can easily be extended to basic MTTs. The decidability of equivalence of DTAs is well-known and can be done in polynomial time [16,17]. To obtain a total transducer we add for each pair (q, f), q ∈ Q and f ∈ Σ that has no rule a new rule q(f(x), y) → ⊥, where ⊥ is an arbitrary symbol in Δout of rank zero.

*Example 7.* In Example 1 we discussed how to adjust the transducer from the introduction to our formal definition. We therefore had to introduce additional rules to obtain a total transducer. Now we still add rules for the same pairs (q, f) but only with right-hand sides ⊥. Therefore the original domain of the transducer is given by a DTA D = (R, Σ, r0, δD) with the rules r0(g(x1, x2)) → (r(x1), r(x2)), r(f(x1, x2)) → (r(x1), r(x2)) and r(i) → ( ) for i = 1, 2, 3. 

**Corollary 2.** *The equivalence of deterministic separated basic MTTs with a* partial *transition function is decidable.*

Next, we show that our result can be used to decide the equivalence of total separated basic MTTs with look-ahead. A total macro tree transducer with regular look-ahead (MTT<sup>R</sup>) is a tuple (Q, Σ, Δ, δ, R, δR) where R is a finite set of look-ahead states and <sup>δ</sup><sup>R</sup> is a total function from <sup>R</sup><sup>k</sup> <sup>→</sup> <sup>R</sup> for every <sup>f</sup> <sup>∈</sup> <sup>Σ</sup>(k) . Additionally we have a deterministic bottom-up tree automaton (P, Σ, δ, −) (without final states). A rule of the MTT is of the form

$$q(f(t\_1, \ldots, t\_k), y\_1, \ldots, y\_k) \to t \qquad \quad \langle p\_1, \ldots, p\_k \rangle\_t$$

and is applicable to an input tree f(t1,...,tk) if the look-ahead automaton accepts t<sup>i</sup> in state p<sup>i</sup> for all i = 1,...,k. For every q, f, p1,...,p<sup>k</sup> there is exactly one such rule. Let N<sup>1</sup> = (Q1, Σ1, Δ1, δ1, R1, δR<sup>1</sup>), N<sup>2</sup> = (Q2, Σ2, Δ2, δ2, R2, δR<sup>2</sup>) be two total separated basic MTTs with look-ahead. We construct total separated basic MTTs M1, M<sup>2</sup> *without* look-ahead as follows. The input alphabet contains for every f ∈ Σ and r1,...,r<sup>k</sup> ∈ R1, r 1,...,r <sup>k</sup> ∈ R<sup>2</sup> the symbols f, r1,...,rk, r 1,...,r <sup>k</sup>. For q(f(x1,...,xk), y) → p[T1,...,Tm] r1,...,rk and q (f(x1,...,xk), y ) → p [T 1,...,T <sup>m</sup>] r 1,...,r <sup>k</sup> we obtain for M<sup>1</sup> the rules

$$\hat{q}(\langle f(x\_1, \ldots, x\_k), r\_1, \ldots, r\_k, r\_1', \ldots, r\_k' \rangle, \underline{y}) \to p[\hat{T}\_1, \ldots, \hat{T}\_m]^\*$$

with Tˆ <sup>i</sup> = ˆqi(xj*<sup>i</sup>* , <sup>r</sup>ˆ1,..., <sup>r</sup>ˆl, <sup>ˆ</sup>r <sup>1</sup>,..., <sup>ˆ</sup>r <sup>l</sup>, Zi) if T<sup>i</sup> = qi(xj*<sup>i</sup>* , Zi) and qi(xj*<sup>i</sup>* , y) → Tˆ <sup>i</sup> rˆ1,..., rˆl and q <sup>i</sup>(xj*<sup>i</sup>* , y ) <sup>→</sup> <sup>ˆ</sup> T <sup>i</sup> <sup>ˆ</sup>r <sup>1</sup>,..., <sup>ˆ</sup>r <sup>l</sup>. If <sup>T</sup><sup>i</sup> <sup>=</sup> <sup>y</sup>j*<sup>i</sup>* then <sup>T</sup><sup>ˆ</sup> <sup>i</sup> = yj*<sup>i</sup>* . The total separated basic MTT M<sup>2</sup> is constructed in the same lines. Thus, Ni, i = 1, 2 can be simulated by Mi, i = 1, 2, respectively, if the input is restricted to the regular tree language of new input trees that represent correct runs of the look-ahead automata.

**Corollary 3.** *The equivalence of total separated basic MTTs with regular lookahead is decidable in polynomial time.*

Last, we consider separated basic MTTs that concatenate strings instead of trees in the parameters. We abbreviate this class of transducers by MTTyp. Thus, the alphabet Δin is not longer a ranked alphabet but a unranked alphabet which elements/letters can be concatenated to words. The procedure to decide equivalence of MTTyp is essentially the same as we discussed in this paper but instead of conjunctions of equations of trees over Δin ∪Y we obtain conjunctions equations of words. Equations of words is a well studied problem [23,24,26]. In particular, the confirmed Ehrenfeucht conjecture states that each conjunction of a set of word equations over a finite alphabet and using a finite number of variables, is equivalent to the conjunction of a finite subset of word equations [19]. Accordingly, by a similar argument as in Sect. 4, the sequences of conjunctions Ψ(h) b,q (z), Ψ(h) b,q- (z), Φ(h) b,(q,q-), h ≥ 0, are ultimately stable. Using an encoding of words by integer matrices and applying techniques as in [19], we obtain:

**Theorem 3.** *The equivalence of total separated basic MTTs that concatenate words instead of trees in the parameters (*Δin *is unranked) is decidable.*

### **6 Related Work**

For several subclasses of attribute systems equivalence is known to be decidable. For instance, attributed grammars without inherited attributes are equivalent to deterministic top-down tree transducers (DT) [3,5]. For this class equivalence was shown to be decidable by Esik [10]. Later, a simplified algorithm was provided in [8]. If the tree translation of an attribute grammar is of linear size increase, then equivalence is decidable, because it is decidable for deterministic macro tree transducers (DMTT) of linear size increase. This follows from the fact that the latter class coincides with the class of (deterministic) MSO definable tree translations (DMSOTT) [6] for which equivalence is decidable [7]. Figure 3 shows a Hasse diagram of classes of translations realized by certain deterministic tree transducers. The prefixes "l", "n", "sn", "b" and "sb" mean "linear size increase", "non-nested", "separated non-nested", "basic" and "separated basic", respectively. A minimal class where it is still open whether equivalence is decidable is the class of *non-nested* attribute systems (nATT) which, on the macro tree transducer side, is included in the class of *basic* deterministic macro tree transducers (bDMTT).

**Fig. 3.** Classes with and without (underlined) known decidability of equivalence

For deterministic top-down tree transducers, equivalence can be decided in EXPSPACE, and in NLOGSPACE if the transducers are total [25]. For the latter class of transducers, one can decide equivalence in polynomial time by transforming the transducer into a canonical normal form (called "earliest normal form") and then checking isomorphism of the resulting transducers [8]. In terms of hardness, we know that equivalence of deterministic top-down tree transducers is EXPTIME-hard. For linear size increase deterministic macro tree transducers the precise complexity is not known (but is at least NP-hard). More complexity results are known for other models of tree transducers such as streaming tree transducers [1], see [25] for a summary.

#### **7 Conclusion**

We have proved that the equivalence problem for separated non-nested attribute systems can be decided in polynomial time. In fact, we have shown a stronger statement, namely that in polynomial time equivalence of *separated basic total deterministic macro tree transducers* can be decided. To see that the latter is a strict superclass of the former, consider the translation that takes a binary tree as input, and outputs the same tree, but under each leaf a new monadic tree is output which represents the inverse Dewey path of that node. For instance, the tree f(f(a, a), a) is translated into the tree f(f(a(1(1(e))), a(2(1(e)))), a(2(e))). A macro tree transducer of the desired class can easily realize this translation using a rule of the form q(f(x1,<sup>2</sup> ), y) → f(q(x1, 1(y)), q(x2, 2(y))). In contrast, no attribute system can realize this translation. The reason is that for every attribute system, the number of distinct output subtrees is linearly bounded by the size of the input tree. For the given translation there is no linear such bound (it is bounded by |s| log(|s|)).

The idea of "separated" to use different output alphabets, is related to the idea of transducers "with origin" [2,11]. In future work we would like to define adequate notions of origin for macro tree transducer, and prove that equivalence of such (deterministic) transducers with origin is decidable.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Justness**

# **A Completeness Criterion for Capturing Liveness Properties (Extended Abstract)**

Rob van Glabbeek1,2(B)

<sup>1</sup> Data61, CSIRO, Sydney, Australia <sup>2</sup> Computer Science and Engineering, University of New South Wales, Sydney, Australia rvg@cs.stanford.edu

**Abstract.** This paper poses that transition systems constitute a good model of distributed systems only in combination with a criterion telling which paths model complete runs of the represented systems. Among such criteria, progress is too weak to capture relevant liveness properties, and fairness is often too strong; for typical applications we advocate the intermediate criterion of justness. Previously, we proposed a definition of justness in terms of an asymmetric concurrency relation between transitions. Here we define such a concurrency relation for the transition systems associated to the process algebra CCS as well as its extensions with broadcast communication and signals, thereby making these process algebras suitable for capturing liveness properties requiring justness.

#### **1 Introduction**

Transition systems are a common model for distributed systems. They consist of sets of states, also called *processes*, and transitions—each transition going from a source state to a target state. A given distributed system D corresponds to a state <sup>P</sup> in a transition system <sup>T</sup>—the initial state of <sup>D</sup>. The other states of <sup>D</sup> are the processes in T that are reachable from P by following the transitions. A run of D corresponds with a *path* in T: a finite or infinite alternating sequence of states and transitions, starting with P, such that each transition goes from the state before to the state after it. Whereas each finite path in T starting from <sup>P</sup> models a *partial run* of <sup>D</sup>, i.e., an initial segment of a (complete) run, typically not each path models a run. Therefore a transition system constitutes a good model of distributed systems only in combination with what we here call a *completeness criterion*: a selection of a subset of all paths as *complete paths*, modelling runs of the represented system.

A *liveness property* says that "something [good] must happen" eventually [18]. Such a property holds for a distributed system if the [good] thing happens in each of its possible runs. One of the ways to formalise this in terms of transition systems is to postulate a set of good states *G* , and say that the liveness property *G* holds for the process P if all complete paths starting in P pass through a state of *G* [16]. Without a completeness criterion the concept of a liveness property appears to be meaningless.

**Example 1.** The transition system on the right models Cataline eating a croissant in Paris. It abstracts from all activity in the world except the eating of that croissant, and thus has two states only—the states of the world before and after this event—and one transition t. We depict states by circles and transitions by arrows between them. An initial state is indicated by a short arrow without a source state. A possible liveness property says that the croissant will be eaten. It corresponds with the set of states *G* consisting of state 2 only. The states of *G* are indicated by shading. 1 2 t

The depicted transition system has three paths starting with state 1: 1, 1 t and 1 t 2. The path 1 t 2 models the run in which Cataline finishes the croissant. The path 1 models a run in which Cataline never starts eating the croissant, and the path 1 t models a run in which Cataline starts eating it, but never finishes. The liveness property *G* holds only when using a completeness criterion that rules out the paths 1 and 1 t as modelling actual runs of the system, leaving 1 t 2 as the sole complete path.

The transitions of transition systems can be understood to model atomic actions that can be performed by the represented systems. Although we allow these actions to be instantaneous or durational, in the remainder of this paper we adopt the assumption that "atomic actions always terminate" [23]. This is a partial completeness criterion. It rules out the path 1 t in Example 1. We build in this assumption in the definition of a path by henceforth requiring that finite paths should end with a state.

*Progress.* The most widely employed completeness criterion is *progress*. <sup>1</sup> In the context of *closed systems*, having no run-time interactions with the environment, it is the assumption that a run will never get stuck in a state with outgoing transitions. This rules out the path 1 in Example 1, as t is outgoing. When adopting progress as completeness criterion, the liveness property *G* holds for the system modelled in Example 1.

Progress is assumed in almost all work on process algebra that deals with liveness properties, mostly implicitly. Milner makes an explicit progress assumption for the process algebra CCS in [20]. A progress assumption is built into the temporal logics LTL [24], CTL [7] and CTL\* [8], namely by disallowing states without outgoing transitions and evaluating temporal formulas by quantifying over infinite paths only.<sup>2</sup> In [17] the 'multiprogramming axiom' is a progress assumption, whereas in [1] progress is assumed as a 'fundamental liveness property'.

<sup>1</sup> Misra [21,22] calls this the 'minimal progress assumption'. In [22] he uses 'progress' as a synonym for 'liveness'. In session types, 'progress' and 'global progress' are used as names of particular liveness properties [4]; this use has no relation with ours.

<sup>2</sup> Exceptionally, states without outgoing transitions are allowed, and then quantification is over all *maximal* paths, i.e. paths that are infinite or end in a state without outgoing transitions [5].

As we argued in [10,15,16], a progress assumption as above is too strong in the context of reactive systems, meaning that it rules out as incomplete too many paths. There, a transition typically represents an interaction between the distributed system being modelled and its environment. In many cases a transition can occur only if both the modelled system *and* the environment are ready to engage in it. We therefore distinguish *blocking* and *non-blocking* transitions. A transition is non-blocking if the environment cannot or will not block it, so that its execution is entirely under the control of the system under consideration. A blocking transition on the other hand may fail to occur because the environment is not ready for it. The same was done earlier in the setting of Petri nets [26], where blocking and non-blocking transitions are called *cold* and *hot*, respectively.

In [10,15,16] we worked with transition systems that are equipped with a partitioning of the transitions into blocking and non-blocking ones, and reformulated the progress assumption as follows:

*a (transition) system in a state that admits a non-blocking transition will eventually progress, i.e., perform a transition.*

In other words, a run will never get stuck in a state with outgoing non-blocking transitions. In Example 1, when adopting progress as our completeness criterion, we assume that Cataline actually wants to eat the croissant, and does not willingly remain in State 1 forever. When that assumption is unwarranted, one would model her behaviour by a transition system different from that of Example 1. However, she may still be stuck in State 1 by lack of any croissant to eat. If we want to model the capability of the environment to withhold a croissant, we classify t as a blocking transition, and the liveness property *G* does not hold. If we abstract from a possible shortage of croissants, t is deemed a non-blocking transition, and, when assuming progress, *G* holds.

As an alternative approach to a dogmatic division of transitions in a transition system, we could shift the status of transitions to the progress property, and speak of B-progress when B is the set of blocking transitions. In that approach, *G* holds for State 1 of Example 1 under the assumption of B-progress when t /<sup>∈</sup> <sup>B</sup>, but not when <sup>t</sup> <sup>∈</sup> <sup>B</sup>.

*Justness.* Justness is a completeness criterion proposed in [10,15,16]. It strengthens progress. It can be argued that once one adopts progress it makes sense to go a step further and adopt even justness.

**Example 2.** The transition system on the top right models Alice making an unending sequence of phone calls in London. There is no interaction of any kind between Alice and Cataline. Yet, we may chose to abstracts from all activity in the world except the eating of the croissant by Cataline, and the making of calls by Alice. This yields the combined transition system on the bottom right. Even when taking the

transition t to be non-blocking, progress is not a strong enough completeness criterion to ensure that Cataline will ever eat the croissant. For the infinite path that loops in the first state is complete. Nevertheless, as nothing stops Cataline from making progress, in reality t will occur [16].

This example is not a contrived corner case, but a rather typical illustration of an issue that is central to the study of distributed systems. Other illustrations of this phenomena occur in [10, Section 9.1], [14, Section 10], [11, Section 1.4], [12] and [6, Section 4]. The criterion of justness aims to ensure the liveness property occurring in these examples. In [16] it is formulated as follows:

*Once a non-blocking transition is enabled that stems from a set of parallel components, one (or more) of these components will eventually partake in a transition.*

In Example 2, t is a non-blocking transition enabled in the initial state. It stems from the single parallel component Cataline of the distributed system under consideration. Justness therefore requires that Cataline must partake in a transition. This can only be t, as all other transitions involve component Alice only. Hence justness says that t must occur. The infinite path starting in the initial state and not containing t is ruled out as unjust, and thereby incomplete.

In [13,16] we explain how justness is fundamentally different from fairness, and why fairness is too strong a completeness criterion for many applications.

Unlike progress, the concept of justness as formulated above is in need of some formalisation, i.e., to formally define a component, to make precise for concrete transition systems what it means for a transition to stem from a set of components, and to define when a component partakes in a transition.

A formalisation of justness for the transition system generated by the process algebra AWN, the *Algebra for Wireless Networks* [9], was provided in [10]. In the same vain, [15] offered a formalisation for the transition systems generated by CCS [20], and its extension ABC, the *Algebra of Broadcast Communication* [15], a variant of CBS, the *Calculus of Broadcasting Systems* [25]. The same was done for CCS extended with *signals* in [6]. These formalisations coinductively define B*-justness*, where B ranges over sets of transitions deemed to be blocking, as a family of predicates on paths, and proceed by a case distinction on the operators in the language. Although these definitions *do* capture the concept of justness formulated above, it is not easy to see why.

A more syntax-independent formalisation of justness occurs in [16]. There it is defined directly on transition systems equipped with a, possibly asymmetric, concurrency relation between transitions. However, the concurrency relation itself is defined only for the transition system generated by a fragment of CCS, and the generalisation to full CCS, and other process algebras, is non-trivial.

It is the purpose of this paper to make the definition of justness from [16] available to a large range of process algebras by defining the concurrency relation for CCS, for ABC, and for the extension of CCS with signals used in [6]. We do this in a precise as well as in an approximate way, and show that both approaches lead to the same concept of justness. Moreover, in all cases we establish a closure property on the concurrency relation ensuring that justness is a meaningful notion. We show that for all these algebras justness is *feasible*. Here feasibility is a requirement on completeness criteria advocated in [1,16,19]. Finally, we establish agreement between the formalisation of justness from [16] and the present paper, and the original coinductive ones from [15] and [6].

# **2 Labelled Transition Systems with Concurrency**

We start with the formal definitions of a labelled transition system, a path, and the completeness criterion *progress*, which is parametrised by the choice of a collection B of blocking actions. Then we define the completeness criterion *justness* on labelled transition system upgraded with a concurrency relation.

**Definition 1.** A *labelled transition system* (LTS) is a tuple (S, *Tr*, *src*,*target*, -) with <sup>S</sup> and *Tr* sets (of *states* and *transitions*), *src*,*target* : *Tr* <sup>→</sup> <sup>S</sup> and - : *Tr* → *L* , for some set of transition labels *L* .

Here we work with LTSs labelled over a structured set of labels (*L* , Act, *Rec*), where *Rec* <sup>⊆</sup> Act <sup>⊆</sup> *<sup>L</sup>* . Labels in Act are *actions*; the ones in *<sup>L</sup>* \Act are *signals*. Transitions labelled with actions model a state chance in the represented system; signal transitions do not—they satisfy *src*(t) = *target*(t) and merely convey a property of a state. *Rec* <sup>⊆</sup> Act is the set of *receptive* actions; sets <sup>B</sup> <sup>⊆</sup> Act of blocking actions must always contain *Rec*. In CCS and most other process algebras *Rec* <sup>=</sup> <sup>∅</sup> and Act <sup>=</sup> *<sup>L</sup>* . Let *Tr*• <sup>=</sup> {<sup>t</sup> <sup>∈</sup> *Tr* <sup>|</sup> -(t) <sup>∈</sup> Act \ *Rec*} be the set of transitions that are neither signals nor receptive.

**Definition 2.** A *path* in a transition system (S, *Tr*, *src*,*target*) is an alternating sequence <sup>s</sup><sup>0</sup> <sup>t</sup><sup>1</sup> <sup>s</sup><sup>1</sup> <sup>t</sup><sup>2</sup> <sup>s</sup><sup>2</sup> ··· of states and non-signal transitions, starting with a state and either being infinite or ending with a state, such that *src*(ti) = <sup>s</sup><sup>i</sup>−<sup>1</sup> and *target*(ti) = s<sup>i</sup> for all relevant i.

A *completeness criterion* is a unary predicate on the paths in a transition system.

**Definition 3.** Let <sup>B</sup> <sup>⊆</sup> Act be a set of actions with *Rec* <sup>⊆</sup> <sup>B</sup>—the *blocking* ones. Then *Tr*• <sup>¬</sup><sup>B</sup> := {<sup>t</sup> <sup>∈</sup> *Tr*• <sup>|</sup> -(t) <sup>∈</sup>/ <sup>B</sup>} is the set of *non-blocking* transitions. A path in T is B*-progressing* if either it is infinite or its last state is the source of no non-blocking transition <sup>t</sup> <sup>∈</sup> *Tr*• <sup>¬</sup><sup>B</sup>.

<sup>B</sup>-progress is a completeness criterion for any choice of <sup>B</sup> <sup>⊆</sup> Act with *Rec* <sup>⊆</sup> <sup>B</sup>.

**Definition 4.** A *labelled transition system with concurrency* (LTSC) is a tuple (S, *Tr*, *src*,*target*, -,*•*) consisting of a LTS (S, *Tr*, *src*,*target*, -) and a *concurrency relation •* <sup>⊆</sup> *Tr*• <sup>×</sup> *Tr*, such that:

$$t \not\succ \not\vdash t \text{ for all } t \in Tr^{\bullet},\tag{1}$$

if <sup>t</sup> <sup>∈</sup> *Tr*• and <sup>π</sup> is a path from *src*(t) to <sup>s</sup> <sup>∈</sup> <sup>S</sup> such that t *•* <sup>v</sup> for all transitions <sup>v</sup> occurring in <sup>π</sup>, then there is a <sup>u</sup> <sup>∈</sup> *Tr*• such that *src*(u) = s, -(u) = -(t) and <sup>t</sup> *•* <sup>u</sup>. (2) Informally, t *•* v means that the transition v does not interfere with t, in the sense that it does not affect any resources that are needed by t, so that in a state where t and v are both possible, after doing v one can still do (a future variant u of) t. In many transition systems *•* is a symmetric relation, denoted .

The transition relation in a labelled transition system is often defined as a relation *Tr* <sup>⊆</sup> <sup>S</sup> <sup>×</sup> *<sup>L</sup>* <sup>×</sup> <sup>S</sup>. This approach is not suitable here, as we will encounter multiple transitions with the same source, target and label that ought to be distinguished based on their concurrency relations with other transitions.

**Definition 5.** A path <sup>π</sup> in an LTSC is <sup>B</sup>*-just*, for *Rec* <sup>⊆</sup> <sup>B</sup> <sup>⊆</sup> Act, if for each transition <sup>t</sup> <sup>∈</sup> *Tr*• <sup>¬</sup><sup>B</sup> with <sup>s</sup> := *src*(t) <sup>∈</sup> <sup>π</sup>, a transition <sup>u</sup> occurs in the suffix of <sup>π</sup> starting at <sup>s</sup>, such that <sup>t</sup> *•* <sup>u</sup>.

Informally, justness requires that once a non-blocking non-signal transition t is enabled, sooner or later a transition u will occur that interferes with it, possibly <sup>t</sup> itself. Note that, for any *Rec* <sup>⊆</sup> <sup>B</sup> <sup>⊆</sup> Act, <sup>B</sup>-justness is a completeness criterion stronger than B-progress.

*Components.* Instead of introducing *•* as a primitive, it is possible to obtain it as a notion derived from two functions *npc*, *afc* : *Tr* <sup>→</sup> *<sup>P</sup>*(*<sup>C</sup>* ), for a given set of *components C* . These functions could then be added as primitives to the definition of an LTS. They are based on the idea that a process represents a system built from parallel components. Each transition is obtained as a synchronisation of activities from some of these components. Now *npc*(t) describes the (nonempty) set of components that are *necessary participants* in the execution of t, whereas *afc*(t) describes the components that are *affected* by the execution of t. The concurrency relation is then defined by

$$t \smile^\bullet u \quad \Leftrightarrow \quad npc(t) \cap afc(u) = \emptyset$$

saying that u interferes with t iff a necessary participant in t is affected by u.

Most material above stems from [16]. However, there *Tr*• = *Tr*, so that *•* is irreflexive, i.e., *npc*(t) <sup>∩</sup> *afc*(t) <sup>=</sup> <sup>∅</sup> for all <sup>t</sup> <sup>∈</sup> *Tr*. Moreover, a fixed set <sup>B</sup> is postulated, so that the notions of progress and justness are not explicitly parametrised with the choice of B. Furthermore, property (2) is new here; it is the weakest closure property that supports Theorem 1 below. In [16] only the model in which *•* is derived from *npc* and *afc* comes with a closure property:

$$\begin{array}{l}\text{If } t, v \in Tr^\bullet \text{ with } src(t) = src(v) \text{ and } npc(t) \cap afc(v) = \emptyset, \text{ then} \\\exists u \in Tr^\bullet \text{ with } src(u) = target(v), \, \ell(u) = \ell(t) \text{ and } npc(u) = npc(t). \end{array} \tag{3}$$

Trivially (3) implies (2).

An important requirement on completeness criteria is that any finite path can be extended into a complete path. This requirement was proposed by Apt, Francez and Katz in [1] and called *feasibility*. It also appears in Lamport [19] under the name *machine closure*. The theorem below list conditions under which B-justness is feasible. Its proof is a variant of a similar theorem from [16] showing conditions under which notions of strong and weak fairness are feasible.


**Table 1.** Structural operational semantics of CCS

**Theorem 1.** If, in an LTSC with set of blocking actions B, only countably many transitions from *Tr*• <sup>¬</sup><sup>B</sup> are enabled in each state, then <sup>B</sup>-justness is feasible.

All proofs can found in the full version of this paper [13].

### **3 CCS and Its Extensions with Broadcast and Signals**

This section presents four process algebras: Milner's *Calculus of Communicating Systems* (CCS) [20], its extensions with broadcast communication ABC [15] and signals CCSS [6], and an alternative presentation of ABC that avoids negative premises in favour of *discard* transitions.

#### **3.1 CCS**

CCS [20] is parametrised with sets *A* of *agent identifiers* and *C*<sup>h</sup> of *(handshake communication) names*; each <sup>A</sup> <sup>∈</sup> *<sup>A</sup>* comes with a defining equation A *def* = P with P being a CCS expression as defined below. *C*¯ <sup>h</sup> := {c¯ <sup>|</sup> <sup>c</sup> <sup>∈</sup> *<sup>C</sup>*<sup>h</sup>} is the set of *co-names*. Complementation is extended to *C*¯ <sup>h</sup> by setting ¯¯c = c. Act := *C*<sup>h</sup> . <sup>∪</sup> *<sup>C</sup>*¯ h . ∪ {τ} is the set of *actions*, where <sup>τ</sup> is a special *internal action*. Below, <sup>c</sup> ranges over *<sup>C</sup>*<sup>h</sup> <sup>∪</sup> *<sup>C</sup>*¯ <sup>h</sup>, η, α, over Act, and A, B over *A* . A *relabelling* is a function <sup>f</sup> : *<sup>C</sup>*<sup>h</sup> <sup>→</sup> *<sup>C</sup>*h; it extends to Act by <sup>f</sup>(¯c) = <sup>f</sup>(c) and <sup>f</sup>(<sup>τ</sup> ) := <sup>τ</sup> . The set PCCS of CCS expressions or *processes* is the smallest set including:


One often abbreviates α.**<sup>0</sup>** by <sup>α</sup>, and <sup>P</sup>\{c} by <sup>P</sup>\c. The traditional semantics of CCS is given by the labelled transition relation → ⊆ <sup>P</sup>CCS <sup>×</sup> Act <sup>×</sup> <sup>P</sup>CCS, where transitions P -Q are derived from the rules of Table 1.

**Table 2.** Structural operational semantics of ABC broadcast communication


#### **3.2 ABC—The Algebra of Broadcast Communication**

The Algebra of Broadcast Communication (ABC) [15] is parametrised with sets *A* of *agent identifiers*, *B* of *broadcast names* and *C*<sup>h</sup> of *handshake communication names*; each <sup>A</sup> <sup>∈</sup> *<sup>A</sup>* comes with a defining equation <sup>A</sup> *def* = P with P being a guarded ABC expression as defined below.

The collections *B*! and *B*? of *broadcast* and *receive* actions are given by *<sup>B</sup>* := {b <sup>|</sup> <sup>b</sup> <sup>∈</sup> *<sup>B</sup>*} for ∈ {!, ?}. Act := *<sup>B</sup>*! . <sup>∪</sup> *<sup>B</sup>*? . ∪ *C*<sup>h</sup> . <sup>∪</sup> *<sup>C</sup>*¯ h . ∪ {τ} is the set of *actions*. Below, <sup>A</sup> ranges over *<sup>A</sup>*, <sup>b</sup> over *<sup>B</sup>*, <sup>c</sup> over *<sup>C</sup>*<sup>h</sup> <sup>∪</sup> *<sup>C</sup>*¯ <sup>h</sup>, <sup>η</sup> over *<sup>C</sup>*<sup>h</sup> <sup>∪</sup> *<sup>C</sup>*¯ <sup>h</sup> ∪ {τ} and α, over Act. A *relabelling* is a function <sup>f</sup> : (*<sup>B</sup>* <sup>→</sup> *<sup>B</sup>*) <sup>∪</sup> (*C*<sup>h</sup> <sup>→</sup> *<sup>C</sup>*h). It extends to Act by f(¯c) = f(c), f(b) = f(b) and f(τ ) := τ . The set PABC of ABC expressions is defined exactly as PCCS. An expression is guarded if each agent identifier occurs within the scope of a prefixing operator. The structural operational semantics of ABC is the same as the one for CCS (see Table 1) but augmented with the rules for broadcast communication in Table 2.

ABC is CCS augmented with a formalism for broadcast communication taken from the Calculus of Broadcasting Systems (CBS) [25]. The syntax without the broadcast and receive actions and all rules except (Bro-l), (Bro-c) and (Bro-r) are taken verbatim from CCS. However, the rules now cover the different name spaces; (Act) for example allows labels of broadcast and receive actions. The rule (Bro-c)—without rules like (Par-l) and (Par-r) with label b!—implements a form of broadcast communication where any broadcast b! performed by a component in a parallel composition is guaranteed to be received by any other component that is ready to do so, i.e., in a state that admits a b?-transition. In order to ensure associativity of the parallel composition, one also needs this rule for components receiving at the same time (1=2=?). The rules (Bro-l) and (Bro-r) are added to make broadcast communication *nonblocking*: without them a component could be delayed in performing a broadcast simply because one of the other components is not ready to receive it.

#### **3.3 CCS with Signals**

*CCS with signals* (CCSS) [6] is CCS extended with a signalling operator Pˆs. Informally, Pˆs emits the signal s to be read by another process. Pˆs could for instance be a traffic light emitting the signal *red*. The reading of the signal emitted by Pˆs does not interfere with any transition of P, such as jumping to *green*. Formally, CCS is extended with a set *S* of *signals*, ranged over by s and r. In CCSS the set of actions is defined as Act := *<sup>S</sup>* . <sup>∪</sup> *<sup>C</sup>*<sup>h</sup> <sup>∪</sup> *<sup>C</sup>*¯ h . ∪ {τ}, and the set


**Table 3.** Structural operational semantics for signals of CCSS

of labels by *<sup>L</sup>* := Act . <sup>∪</sup> *<sup>S</sup>*¯, where *<sup>S</sup>*¯ := {s¯ <sup>|</sup> <sup>s</sup> <sup>∈</sup> *<sup>S</sup>* }. A relabelling is a function <sup>f</sup> : (*<sup>S</sup>* <sup>→</sup> *<sup>S</sup>* ) <sup>∪</sup> (*C*<sup>h</sup> <sup>→</sup> *<sup>C</sup>*h). It extends to *<sup>L</sup>* by <sup>f</sup>(¯c) = <sup>f</sup>(c) for <sup>c</sup> <sup>∈</sup> *<sup>C</sup>*<sup>h</sup> <sup>∪</sup> *<sup>S</sup>* and f(τ ) := τ . The set PCCSS of CCSS expressions is defined just as PCCS, but now also <sup>P</sup>ˆ<sup>s</sup> is a process for <sup>P</sup> <sup>∈</sup>PCCSS and <sup>s</sup>∈*S*, and restriction also covers signals.

The semantics of CCSS is given by the labelled transition relation → ⊆ <sup>P</sup>CCSS <sup>×</sup> *<sup>L</sup>* <sup>×</sup> <sup>P</sup>CCSS derived from the rules of CCS (Table 1), where now η, - range over *<sup>L</sup>* , <sup>α</sup> over Act, <sup>c</sup> over *<sup>C</sup>*<sup>h</sup> <sup>∪</sup> *<sup>S</sup>* and <sup>L</sup> <sup>⊆</sup> *<sup>C</sup>*<sup>h</sup> <sup>∪</sup> *<sup>S</sup>* , augmented with the rules of Table 3. The first rule is the base case showing that a process Pˆs emits the signal s. The rule below models the fact that signalling cannot prevent a process from making progress.

The original semantics of CCSS [6] featured unary predicates P <sup>s</sup> on processes to model that P emits the signal s; here, inspired by [3], these predicates are represented as transitions P <sup>s</sup>¯ P. Whereas this leads to a simpler operational semantics, the price paid is that these new *signal transitions* need special treatment in the definition of justness—cf. Definitions 2 and 5.

#### **3.4 Using Signals to Avoid Negative Premises in ABC**

Finally, we present an alternative operational semantics ABCd of ABC that avoids negative premises. The price to be paid is the introduction of signals that indicate when a state does not admit a receive action.<sup>3</sup> To this end, let *<sup>B</sup>*: := {b: <sup>|</sup> <sup>b</sup> <sup>∈</sup> *<sup>B</sup>*} be the set of *broadcast discards*, and *<sup>L</sup>* := *<sup>B</sup>*: . <sup>∪</sup> Act the set of *transition labels*, with Act as in Sect. 3.2. The semantics is given by the labelled transition relation → ⊆ PABC × *L* × PABC derived from the rules of CCS (Table 1), where now <sup>c</sup> ranges over *<sup>C</sup>*<sup>h</sup> <sup>∪</sup> *<sup>C</sup>*¯ <sup>h</sup>, <sup>η</sup> over *<sup>C</sup>*<sup>h</sup> <sup>∪</sup> *<sup>C</sup>*¯ <sup>h</sup> ∪ {τ}, <sup>α</sup> over Act and over *L* , augmented with the rules of Table 4.

**Lemma 1.** [25] <sup>P</sup> <sup>b</sup>: <sup>Q</sup> iff <sup>Q</sup> <sup>=</sup> <sup>P</sup> <sup>∧</sup> <sup>P</sup> <sup>b</sup>? − , for P, Q <sup>∈</sup> <sup>P</sup>ABC and <sup>b</sup> <sup>∈</sup> *<sup>B</sup>*.

So the structural operational semantics of ABC from Sects. 3.2 and 3.4 yield the same labelled transition relation −→ when transitions labelled <sup>b</sup>: are ignored. This approach stems from the Calculus of Broadcasting Systems (CBS) [25].

<sup>3</sup> A state <sup>P</sup> admits an action <sup>α</sup> <sup>∈</sup> Act if there exists a transition P Q <sup>α</sup> .

**Table 4.** SOS of ABC broadcast communication with discard transitions


#### **4 An LTS with Concurrency for CCS and Its Extensions**

The forthcoming material applies to each of the process algebras from Sect. 3, or combinations thereof. Let P be the set of processes in the language.

We allocate an LTS as in Definition 1 to these languages by taking S to be the set <sup>P</sup> of processes, and *Tr* the set of *derivations* <sup>t</sup> of transitions <sup>P</sup> - −→ <sup>Q</sup> with P, Q <sup>∈</sup> <sup>P</sup>. Of course *src*(t) = <sup>P</sup>, *target*(t) = <sup>Q</sup> and -(t) = -. Here a *derivation* of a transition <sup>P</sup> - −→ <sup>Q</sup> is a well-founded tree with the nodes labelled by transitions, such that the root has label <sup>P</sup> - −→ <sup>Q</sup>, and if <sup>μ</sup> is the label of a node and <sup>K</sup> is the set of labels of the children of this node then <sup>K</sup> μ is an instance of a rule of Tables 1, 2, 3 and 4.

We take *Rec* := *B*? in ABC and ABCd: broadcast receipts can always be blocked by the environment, namely by not broadcasting the requested message. For CCS and CCSS we take *Rec* := ∅, thus allowing environments that can always participate in certain handshakes, and/or always emit certain signals.

Following [15], we give a name to any derivation of a transition: The unique derivation of the transition α.P <sup>α</sup> <sup>P</sup> using the rule (Act) is called <sup>α</sup> <sup>→</sup>P. The unique derivation of the transition <sup>P</sup>ˆ<sup>s</sup> <sup>s</sup>¯ −→ <sup>P</sup>ˆ<sup>s</sup> is called <sup>P</sup> <sup>→</sup><sup>s</sup>. The derivation obtained by application of (Comm) or (Bro-c) on the derivations t and u of the premises of that rule is called <sup>t</sup>|u. The derivation obtained by application of (Par-l) or (Bro-l) on the derivation t of the (positive) premise of that rule, and using process <sup>Q</sup> at the right of <sup>|</sup>, is <sup>t</sup>|Q. In the same way, (Par-r) and (Bro-r) yield <sup>P</sup>|u, whereas (Sum-l), (Sum-r), (Res), (Rel) and (Rec) yield <sup>t</sup>+Q, <sup>P</sup>+t, <sup>t</sup>\L, <sup>t</sup>[f] and <sup>A</sup>:t. These names reflect syntactic structure: <sup>t</sup>|<sup>P</sup> <sup>=</sup> <sup>P</sup>|<sup>t</sup> and (t|u)|<sup>v</sup> <sup>=</sup> <sup>t</sup>|(u|v).

Table 3, moreover, contributes derivations tˆr. The derivations obtained by application of the rules of Table <sup>4</sup> are called <sup>b</sup>:**0**, <sup>b</sup>:α.P, <sup>t</sup> <sup>+</sup> <sup>u</sup>, <sup>t</sup>|<sup>u</sup> and <sup>A</sup>:t, where t and u are the derivations of the premises.

*Synchrons.* Let *Arg* := {+L, <sup>+</sup>R, <sup>|</sup>L, <sup>|</sup>R, \L, [f], A:,ˆ<sup>r</sup> <sup>|</sup> <sup>L</sup> <sup>⊆</sup> *<sup>C</sup>*<sup>h</sup> <sup>∧</sup> <sup>f</sup> a relabelling <sup>∧</sup> <sup>A</sup> <sup>∈</sup> *<sup>A</sup>* <sup>∧</sup> <sup>r</sup> <sup>∈</sup> *<sup>S</sup>* }. A *synchron* is an expression <sup>σ</sup>( <sup>α</sup> <sup>→</sup>P) or <sup>σ</sup>(<sup>P</sup> <sup>→</sup><sup>s</sup>) or <sup>σ</sup>(b:) with <sup>σ</sup> <sup>∈</sup> *Arg*∗, <sup>α</sup> <sup>∈</sup> Act, <sup>s</sup> <sup>∈</sup> *<sup>S</sup>* , <sup>P</sup> <sup>∈</sup> <sup>P</sup> and <sup>b</sup> <sup>∈</sup> *<sup>B</sup>*. An *argument* <sup>ι</sup> <sup>∈</sup> *Arg* is applied componentwise to a set <sup>Σ</sup> of synchrons: <sup>ι</sup>(Σ) := {ις <sup>|</sup> <sup>ς</sup> <sup>∈</sup> <sup>Σ</sup>}.

The set of synchrons ς(t) of a derivation t of a transition is defined by

$$\begin{array}{lclclcl}\varsigma(\stackrel{\alpha}{\rightarrow}P)&=&\{(\stackrel{\alpha}{\rightarrow}P)\}&\varsigma(t+Q)&=+\_{L\varsigma}\varsigma(t)&\varsigma(P+t)&=+\_{R\varsigma}\varsigma(t)\\\varsigma(t|Q)&=|\_{L}\varsigma(t)&\varsigma(t|u)&=|\_{L}\varsigma(t)\cup|\_{R}\varsigma(u)&\varsigma(P|u)&=|\_{R\varsigma}\varsigma(u)\\\varsigma(t|L)&=&L\varsigma(t)&\varsigma(t|f|)&=[f|\varsigma(t)&\varsigma(A:t)&=A\circ\varsigma(t)\\\varsigma(P^{\rightarrow})&=&\{(P^{\rightarrow})\}&\varsigma(t^{\vee}\bar{r})&=\bar{r}\cdot\varsigma(t)\\\varsigma(b\bullet)&=&\{(b\circ)\}&\varsigma(b\circ\alpha P)&=\{(b\circ)\}&\varsigma(t+v)&=+\_{L}\varsigma(t)\cup+\_{R\varsigma}\varsigma(v)\end{array}$$

Thus, a synchron of t represents a path in the proof-tree t from its root to a leaf. Each transition derivation can be seen as the synchronisation of one or more synchrons. Note that we use the symbol ς as a variable ranging over synchrons, and as the name of a function—context disambiguates.

**Example 3.** The CCS process P = -c.Q + (d.R|e.S) <sup>|</sup>c.T¯ \<sup>c</sup> has 3 outgoing transitions: , and . Let <sup>t</sup><sup>τ</sup> , <sup>t</sup><sup>d</sup> and <sup>t</sup><sup>e</sup> <sup>∈</sup> *Tr* be the unique derivations of these transitions. Then <sup>t</sup><sup>τ</sup> is a synchronisation of two synchrons, whereas <sup>t</sup><sup>d</sup> and <sup>t</sup><sup>e</sup> <sup>∈</sup> *Tr* have only one each: <sup>ς</sup>(t<sup>τ</sup> ) = {\<sup>c</sup> <sup>|</sup><sup>L</sup> <sup>+</sup>L( <sup>c</sup> <sup>→</sup>Q), \<sup>c</sup> <sup>|</sup><sup>R</sup>( <sup>c</sup>¯ <sup>→</sup>T)}, <sup>ς</sup>(td) = {\<sup>c</sup> <sup>|</sup><sup>L</sup> <sup>+</sup><sup>R</sup> <sup>|</sup><sup>L</sup>( <sup>d</sup> <sup>→</sup>R)} and <sup>ς</sup>(te) = {\<sup>c</sup> <sup>|</sup><sup>L</sup> <sup>+</sup><sup>R</sup> <sup>|</sup><sup>R</sup>( <sup>e</sup> <sup>→</sup>S)}. The derivations <sup>t</sup><sup>d</sup> and <sup>t</sup><sup>e</sup> <sup>∈</sup> *Tr* can be seen as *concurrent*, because their synchrons come from opposite sides of the same parallel composition; one would expect that after one of them occurs, a variant of the other is still possible. Indeed, there is a transition . Let t <sup>d</sup> be its unique derivation. The derivation t<sup>d</sup> and t <sup>d</sup> are surely different, for they have a different source state. Even their synchrons are different: ς(t <sup>d</sup>) = {\<sup>c</sup> <sup>|</sup><sup>L</sup> <sup>|</sup><sup>L</sup>( <sup>d</sup> <sup>→</sup>R)}. Nevertheless, <sup>t</sup> <sup>d</sup> can be recognised as a future variant of td: its only synchron has merely lost an argument +R. This choice got resolved when taking the transition te.

We proceed to formalise the concepts "future variant" and "concurrent" that occur above, by defining two binary relations ❀ <sup>⊆</sup> *Tr*• <sup>×</sup>*Tr*• and *•* <sup>⊆</sup> *Tr*• <sup>×</sup>*Tr* such that the following properties hold:


$$\text{If } t \smile v \text{ with } src(t) = src(v) \text{ then } \exists t' \text{ with } src(t') = target(v) \text{ and } t \smile t'. \tag{6}$$

$$\text{If } t \leadsto t' \text{ then } \ell(t') = \ell(t) \text{ and } t \not\succ t'. \tag{7}$$

With t *•* v we mean that the possible occurrence of t is unaffected by the occurrence of v. Although for CCS the relation *•* is symmetric (and *Tr*• = *Tr*), for ABC and CCSS it is not:

**Example 4 (**[15]**).** Let <sup>P</sup> be the process <sup>b</sup>!|(b? + <sup>c</sup>), and let <sup>t</sup> and <sup>v</sup> be the derivations of the b!- and c-transitions of P. The broadcast b! is in our view completely under the control of the left component; it will occur regardless of whether the right component listens to it or not. It so happens that if b! occurs in state P, the right component will listen to it, thereby disabling the possible occurrence of <sup>c</sup>. For this reason we have t *•* <sup>v</sup> but <sup>v</sup> *•* <sup>t</sup>.

**Example 5.** Let <sup>P</sup> be the process <sup>a</sup>ˆs|s, and let <sup>t</sup> and <sup>v</sup> be the derivations of the a- and τ -transitions of P. The occurrence of a disrupts the emission of the signal s, thereby disabling the τ -transition. However, reading the signal does not affect the possible occurrence of <sup>a</sup>. For this reason we have t *•* <sup>v</sup> but <sup>v</sup> *•* <sup>t</sup>.

**Proposition 1.** Assume (4)–(7). Then the LTS (P, *Tr*, *src*,*target*, -), augmented with the concurrency relation *•*, is an LTSC in the sense of Definition 4.

We now proceed to define the relations ❀ and *•* on synchrons, and then lift them to derivations. Subsequently, we establish (4)–(7).

The elements +L, +R, A: and ˆr of *Arg* are called *dynamic* [20]; the others are *static*. (Static operators stay around when their arguments perform transitions.) For <sup>σ</sup> <sup>∈</sup> *Arg*<sup>∗</sup> let *static*(σ) be the result of removing all dynamic elements from <sup>σ</sup>. For <sup>ς</sup> <sup>=</sup> συ with <sup>υ</sup> ∈ {( <sup>α</sup> <sup>→</sup>P),(<sup>P</sup> <sup>→</sup><sup>s</sup>),(b:)} let *static*(ς) := *static*(σ)υ.

**Definition 6.** A synchron ς is a *possible successor* of a synchron ς, notation ς ❀ ς , if either <sup>ς</sup> <sup>=</sup> <sup>ς</sup>, or <sup>ς</sup> has the form <sup>σ</sup>1|<sup>D</sup>ς<sup>2</sup> for some <sup>σ</sup><sup>1</sup> <sup>∈</sup> *Arg*∗, <sup>D</sup> ∈ {L, R} and <sup>ς</sup><sup>2</sup> a synchron, and <sup>ς</sup> <sup>=</sup> *static*(σ1)|<sup>D</sup>ς2.

**Definition 7.** Two synchrons ς and υ are *directly concurrent*, notation ς <sup>d</sup> υ, if <sup>ς</sup> has the form <sup>σ</sup>1|<sup>D</sup>ς<sup>2</sup> and <sup>υ</sup> <sup>=</sup> <sup>σ</sup>1|<sup>E</sup>υ<sup>2</sup> with {D, E} <sup>=</sup> {L, R}. Two synchrons ς and υ are *concurrent*, notation ς υ , if <sup>∃</sup>ς,υ. ς ❀ς <sup>d</sup> υ ❀ υ .

*Necessary and Active Synchrons.* All synchrons of the form <sup>σ</sup>( <sup>α</sup> <sup>→</sup>P) are *active*; their execution causes a transition α.P <sup>α</sup> −→ <sup>P</sup> in the relevant component of the represented system. Synchrons σ(P <sup>→</sup><sup>s</sup>) and σ(b:) are passive; they are not affecting any state change. Let aς(t) denote the set of active synchrons of a derivation t. So a transition t is labelled by a signal, i.e. -(t) <sup>∈</sup>/ Act, iff aς(t) = <sup>∅</sup>.

Whether a synchron <sup>ς</sup> <sup>∈</sup> <sup>ς</sup>(t) is *necessary* for <sup>t</sup> to occur is defined only for <sup>t</sup> <sup>∈</sup> *Tr*•. If <sup>t</sup> is the derivation of a broadcast transition, i.e., -(t) = b! for some <sup>b</sup> <sup>∈</sup> *<sup>B</sup>*, then exactly one synchron <sup>υ</sup> <sup>∈</sup> <sup>ς</sup>(t) is of the form <sup>σ</sup>( <sup>b</sup>! <sup>→</sup>P), while all the other <sup>ς</sup> <sup>∈</sup> <sup>ς</sup>(t) are of the form <sup>σ</sup> ( b? <sup>→</sup>Q) (or possibly <sup>σ</sup> (b:) in ABCd). Only the synchron υ is necessary for the broadcast to occur, as a broadcast is unaffected by whether or not someone listens to it. Hence we define nς(t) := {υ}. For all <sup>t</sup> <sup>∈</sup> *Tr*• with -(t) <sup>∈</sup>/ *<sup>B</sup>*! (i.e. -(t) <sup>∈</sup> *<sup>S</sup>* <sup>∪</sup> *<sup>C</sup>*<sup>h</sup> <sup>∪</sup> *<sup>C</sup>*¯ <sup>h</sup> ∪ {τ}) we set nς(t) := <sup>ς</sup>(t), thereby declaring all synchrons of the derivation necessary.

**Definition 8.** A derivation t <sup>∈</sup> *Tr*• is a *possible successor* of a derivation <sup>t</sup> <sup>∈</sup> *Tr*•, notation t ❀ t , if t and t have equally many necessary synchrons and each necessary synchron of t is a possible successor of one of <sup>t</sup>; i.e., if <sup>|</sup>nς(t)<sup>|</sup> <sup>=</sup> <sup>|</sup>nς(<sup>t</sup> )| and <sup>∀</sup>ς <sup>∈</sup> nς(<sup>t</sup> ). <sup>∃</sup><sup>ς</sup> <sup>∈</sup> nς(t). ς ❀ <sup>ς</sup> .

This implies that the relation ❀ between nς(t) and nς(u) is a bijection.

**Definition 9.** Derivation <sup>t</sup> <sup>∈</sup> *Tr*• is *unaffected by* <sup>u</sup>, notation t *•* <sup>u</sup>, if <sup>∀</sup><sup>ς</sup> <sup>∈</sup> nς(t). <sup>∀</sup><sup>υ</sup> <sup>∈</sup> aς(u).ς υ.

So t is unaffected by u if no active synchron of u interferes with a necessary synchron of t. Passive synchrons do not interfere at all.

In Example 3 one has t<sup>d</sup> te, t<sup>d</sup> ❀ t <sup>d</sup> and t <sup>d</sup> te. Here tu denotes t *•* <sup>u</sup> <sup>∧</sup> u *•* <sup>t</sup>.

**Proposition 2.** The relations ❀ and *•* satisfy the properties (4)–(7).

### **5 Components**

This section proposes a concept of system components associated to a transition, with a classification of components as necessary and/or affected. We then define a concurrency relation *•* <sup>s</sup> in terms of these components closely mirroring Definition 9 in Sect. 4 of the concurrency relation *•* in terms of synchrons. We show that *•* and *•* <sup>s</sup>, as well as the concurrency relation defined in terms of components in Sect. 2, give rise to the same concept of justness.

<sup>A</sup> *static component* is a string <sup>σ</sup> <sup>∈</sup> *Arg*<sup>∗</sup> of static arguments. Let *<sup>C</sup>* be the set of static components. The *static component* c(ς) of a synchron ς is defined to be the largest prefix γ of ς that is a static component.

Let *comp*(t) := {c(ς) <sup>|</sup> <sup>ς</sup> <sup>∈</sup> <sup>ς</sup>(t)} be the set of *static components* of <sup>t</sup>. Moreover, *npc*(t) := {c(ς) <sup>|</sup> <sup>ς</sup> <sup>∈</sup> nς(t)} and *afc*(t) := {c(ς) <sup>|</sup> <sup>ς</sup> <sup>∈</sup> aς(t)} are the *necessary* and *affected* static components of <sup>t</sup> <sup>∈</sup> *Tr*. Since nς(t) <sup>⊆</sup> <sup>ς</sup>(t) and aς(t) <sup>⊆</sup> <sup>ς</sup>(t), we have *npc*(t) <sup>⊆</sup> *comp*(t) and *afc*(t) <sup>⊆</sup> *comp*(t).

Two static components <sup>γ</sup> and <sup>δ</sup> are *concurrent*, notation γδ, if <sup>γ</sup> <sup>=</sup> <sup>σ</sup>1|<sup>D</sup>γ<sup>2</sup> and <sup>δ</sup> <sup>=</sup> <sup>σ</sup>1|<sup>E</sup>δ<sup>2</sup> with {D, E} <sup>=</sup> {L, R}.

**Definition 10.** Derivation <sup>t</sup> <sup>∈</sup> *Tr*• is *statically unaffected by* <sup>u</sup>, t *•* <sup>s</sup> u, iff <sup>∀</sup><sup>γ</sup> <sup>∈</sup> *npc*(t). <sup>∀</sup><sup>δ</sup> <sup>∈</sup> *afc*(u).γ δ.

**Proposition 3.** If t *•* <sup>s</sup> u then t *•* u.

In Example <sup>3</sup> we have <sup>t</sup><sup>d</sup> t<sup>e</sup> but <sup>t</sup><sup>d</sup> <sup>s</sup> <sup>t</sup>e, for *npc*(te) = *comp*(te) = *comp*(td) = *afc*(td) = {\<sup>c</sup> <sup>|</sup><sup>L</sup>}. Here t <sup>s</sup> <sup>u</sup> denotes t *•* <sup>s</sup> <sup>u</sup> <sup>∧</sup> u *•* <sup>s</sup> t. Hence the implication of Proposition 3 is strict.

**Proposition 4.** The functions *npc* and *afc* : *Tr* → *P*(*C* ) satisfy closure property (3) of Sect. 2.

The concurrency relation *•* <sup>c</sup> defined in terms of static components according to the template in [16], recalled in Sect. 2, is not identical to *•* s:

**Definition 11.** Let t, u be derivations. Write t *•* <sup>c</sup> <sup>u</sup> iff *npc*(t) <sup>∩</sup> *afc*(u) = <sup>∅</sup>.

Nevertheless, we show that for the study of justness it makes no difference whether justness is defined using the concurrency relation *•*, *•* <sup>s</sup> or *•* c.

**Theorem 2.** A path is *•*-B-just iff it is *•* <sup>c</sup>-B-just iff it is *•* s-B-just.

#### **6 A Coinductive Characterisation of Justness**

In this section we show that the *•*-based concept of justness defined in this paper coincides with a coinductively defined concept of justness, for CCS and ABC originating from [15]. To state the coinductive definition of justness, we need to define the notion of the decomposition of a path starting from a process with a leading static operator.

Any derivation <sup>t</sup> <sup>∈</sup> *Tr* of a transition with *src*(t) = <sup>P</sup>|<sup>Q</sup> has the shape

– <sup>u</sup>|Q, with *target*(t) = *target*(u)|Q, – <sup>u</sup>|v, with *target*(t) = *target*(u)|*target*(v), – or <sup>P</sup>|v, with *target*(t) = <sup>P</sup>|*target*(v).

Let a path *of* a process P be a path as in Definition 2 starting with P. Now the *decomposition* of a path <sup>π</sup> of <sup>P</sup>|<sup>Q</sup> into paths <sup>π</sup><sup>1</sup> and <sup>π</sup><sup>2</sup> of <sup>P</sup> and <sup>Q</sup>, respectively, is obtained by concatenating all left-projections of the states and transitions of π into a path of P and all right-projections into a path of Q—notation π <sup>π</sup>1|π2. Here it could be that π is infinite, yet either π<sup>1</sup> or π<sup>2</sup> (but not both) are finite.

Likewise, <sup>t</sup> <sup>∈</sup> *Tr* with *src*(t) = <sup>P</sup>[f] has the shape <sup>u</sup>[f] with *target*(t) = *target*(u)[f]. The *decomposition* π of a path π of P[f] is the path obtained by leaving out the outermost [f] of all states and transitions in π, notation π π [f]. In the same way one defines the decomposition of a path of <sup>P</sup>\c.

The following co-inductive definition of the family B-justness of predicates on paths, with one family member of each choice of a set B of blocking actions, stems from [15, Appendix E]—here <sup>D</sup>¯ := {c¯ <sup>|</sup> <sup>c</sup> <sup>∈</sup> <sup>D</sup>}.

**Definition 12.** <sup>B</sup>*-justness*, for *<sup>B</sup>*? <sup>⊆</sup> <sup>B</sup> <sup>⊆</sup> Act, is the largest family of predicates on the paths in the LTS of ABC such that


Intuitively, justness is a completeness criterion, telling which paths can actually occur as runs of the represented system. A path is B-just if it can occur in an environment that may block the actions in B. In this light, the first, third, fourth and fifth requirements above are intuitively plausible. The second requirement first of all says that if π <sup>π</sup>1|π<sup>2</sup> and <sup>π</sup> can occur in the environment that may block the actions in B, then π<sup>1</sup> and π<sup>2</sup> must be able to occur in such an environment as well, or in environments blocking less. The last clause in this requirement prevents a C-just path of P and a D-just path of Q to compose into <sup>a</sup> <sup>B</sup>-just path of <sup>P</sup>|<sup>Q</sup> when <sup>C</sup> contains an action <sup>c</sup> and <sup>D</sup> the complementary action ¯<sup>c</sup> (except when <sup>τ</sup> <sup>∈</sup> <sup>B</sup>). The reason is that no environment (except one that can block τ -actions) can block both actions for their respective components, as nothing can prevent them from synchronising with each other.

The fifth requirement helps characterising processes of the form <sup>b</sup>+ (A|b) and a.(A|b), with <sup>A</sup> *def* = a.A. Here, the first transition 'gets rid of' the choice and of the leading action a, respectively, and this requirement reduces the justness of paths of such processes to their suffixes.

**Example 6.** To illustrate Definition 12 consider the unique infinite path of the process Alice|Cataline of Example <sup>2</sup> in which the transition <sup>t</sup> does not occur. Taking the empty set of blocking actions, we ask whether this path is ∅-just. If it were, then by the second requirement of Definition 12 the projection of this path on the process Cataline would need to be ∅-just as well. This is the path 1 (without any transitions) in Example 1. It is not ∅-just by the first requirement of Definition 12, because its last state 1 admits a transition.

We now establish that the concept of justness from Definition 12 agrees with the concept of justness defined earlier in this paper.

**Theorem 3.** A path is *•* <sup>s</sup>-B-just iff it is B-just in the sense of Definition 12.

If a path <sup>π</sup> is <sup>B</sup>-just then it is <sup>C</sup>-just for any <sup>C</sup> <sup>⊇</sup> <sup>B</sup>. Moreover, the collection of sets B such that a given path π is B-just is closed under arbitrary intersection, and thus there is a least set <sup>B</sup><sup>π</sup> such that <sup>π</sup> is <sup>B</sup>-just. Actions <sup>α</sup> <sup>∈</sup> *<sup>B</sup>*<sup>π</sup> are called π*-enabled* [14]. A path is called *just* (without a predicate B) iff it is B-just for some *<sup>B</sup>*? <sup>⊆</sup> <sup>B</sup> <sup>⊆</sup> *<sup>B</sup>*? . ∪ *C*<sup>h</sup> . <sup>∪</sup> *<sup>C</sup>*¯ h . ∪ *S* [3,6,14,15], which is the case iff it is *B*? . ∪ *C*<sup>h</sup> . <sup>∪</sup> *<sup>C</sup>*¯ h . ∪ *S* -just.

In [3] a definition of justness for CCS with signal transition appears, very similar to Definition 12; it also applies to CCSS as presented here. Generalising Theorem 3, one can show that a path is (*•* <sup>s</sup> or *•* <sup>c</sup> or) *•*-just iff it is just in this sense. The same holds for the coinductive definition of justness from [6].

# **7 Conclusion**

We advocate justness as a reasonable completeness criterion for formalising liveness properties when modelling distributed systems by means of transition systems. In [16] we proposed a definition of justness in terms of a, possibly asymmetric, concurrency relation between transitions. The current paper defined such a concurrency relation for the transition systems associated to CCS, as well as its extensions with broadcast communication and signals, thereby making the definition of justness from [16] available to these languages. In fact, we provided three versions of the concurrency relation, and showed that they all give rise to the same concept of justness. We expect that this style of definition will carry over to many other process algebras. We showed that justness satisfies the criterion of feasibility, and proved that our formalisation agrees with previous coinductive formalisations of justness for these languages.

Concurrency relations between transitions in transition systems have been studied in [28]. Our concurrency relation *•* follows the same computational intuition. However, in [28] transitions are classified as concurrent or not only when they have the same source, whereas as a basis for the definition of justness here we need to compare transitions with different sources. Apart from that, our concurrency relation is more general in that it satisfies fewer closure properties, and moreover is allowed to be asymmetric.

Concurrency is represented explicitly in models like Petri nets [26], event structures [29], or asynchronous transition systems [2,27,30]. We believe that the semantics of CCS in terms of such models agrees with its semantics in terms of labelled transition systems with a concurrency relation as given here. However, formalising such a claim requires a choice of an adequate justness-preserving semantic equivalence defined on the compared models. Development of such semantic equivalences is a topic for future research.

**Acknowledgement.** I am grateful to Peter H¨ofner, Victor Dyseryn and Filippo de Bortoli for valuable feedback.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Path Category for Free Open Morphisms from Coalgebras with Non-deterministic Branching**

Thorsten Wißmann1(B) , J´er´emy Dubut2,3, Shin-ya Katsumata<sup>2</sup>, and Ichiro Hasuo2,4

<sup>1</sup> Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany thorsten.wissmann@fau.de <sup>2</sup> National Institute of Informatics, Tokyo, Japan {dubut,s-katsumata,hasuo}@nii.ac.jp <sup>3</sup> Japanese-French Laboratory for Informatics, Tokyo, Japan <sup>4</sup> SOKENDAI, Hayama, Kanagawa, Japan

**Abstract.** There are different categorical approaches to variations of transition systems and their bisimulations. One is coalgebra for a functor G, where a bisimulation is defined as a span of G-coalgebra homomorphism. Another one is in terms of path categories and open morphisms, where a bisimulation is defined as a span of open morphisms. This similarity is no coincidence: given a functor G, fulfilling certain conditions, we derive a path-category for pointed G-coalgebras and lax homomorphisms, such that the open morphisms turn out to be precisely the G-coalgebra homomorphisms. The above construction provides path-categories and trace semantics for free for different flavours of transition systems: (1) non-deterministic tree automata (2) regular nondeterministic nominal automata (RNNA), an expressive automata notion living in nominal sets (3) multisorted transition systems. This last instance relates to Lasota's construction, which is in the converse direction.

**Keywords:** Coalgebra · Open maps · Categories · Nominal sets

# **1 Introduction**

*Coalgebras* [25] and *open maps* [16] are two main categorical approaches to transition systems and bisimulations. The former describes the branching type of systems as an endofunctor, a system becoming a coalgebra and bisimulations being spans of coalgebra homomorphisms. Coalgebra theory makes it easy to consider state space types in different settings, e.g. nominal sets [17,18] or algebraic categories [5,11,20]. The latter, open maps, describes systems as objects of

This research was supported by ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603), JST. The first author was supported by the DFG project MI 717/5-1. He expresses his gratitude for having been invited to Tokyo, which initiated the present work.


**Table 1.** Two approaches to categorical (bi)simulations

a category and the execution types as particular objects called paths. In this case, bisimulations are spans of open morphisms. Open maps are particularly adapted to extend bisimilarity to history dependent behaviors, e.g. true concurrency [7,8], timed systems [22] and weak (bi)similarity [9]. Coalgebra homomorphisms and open maps are then key concepts to describe bisimilarity categorically. They intuitively correspond to functional bisimulations, that is, those maps between states whose graph is a bisimulation.

We are naturally interested in the relationship between those two categorical approaches to transition systems and bisimulations. A reduction of open maps situations to coalgebra was given by Lasota using multi-sorted transition systems [19]. In this paper, we give the reduction in the other direction: from the category Coalgl(T F) of pointed T F-coalgebras and lax homomorphisms, we construct the path-category Path and a functor J : Path −→ Coalgl(T F) such that Path-open morphisms coincide with strict homomorphisms, hence functional bisimulations. Here, T is a functor describing the branching behaviour and F describes the input type, i.e. the type of data that is processed (e.g. words or trees). This development is carried out with the case where T is a powerset-like functor, and covers transition systems allowing non-deterministic branching.

The key concept in the construction of Path are F-*precise maps*. Roughly speaking in set, a map f : X −→F Y is F-precise if every y <sup>∈</sup> Y is used precisely once in f, i.e. there is a unique x such that y appears in f(x) and additionally y appears precisely once in f(x). Such an F-precise map represents one deterministic step (of shape F). Then a path P <sup>∈</sup> Path is a finite sequence of deterministic steps, i.e. finitely many precise maps. J converts such a data into a pointed T F-coalgebra. There are many existing notions of paths and traces in coalgebra [4,12,13,21], which lack the notion of *precise* map, which is crucial for the present work.

Once we set up the situation J : Path −→ Coalgl(T F), we are on the framework of open map bisimulations. Our construction of Path using precise maps is justified by the characterisation theorem: Path-open morphisms and strict coalgebra homomorphisms coincide (Theorems 3.20 and 3.24). This coincidence relies on the concept of path-reachable coalgebras, namely, coalgebras such that every state can be reached by a path. Under mild conditions, path-reachability is equivalent to an existing notion in coalgebra, defined as the non-existence of a proper sub-coalgebra (Sect. 3.5). Additionally, this characterization produces a canonical trace semantics for free, given in terms of paths (Sect. 3.6).

We illustrate our reduction with several concrete situations: different classes of non-deterministic top-down tree automata using analytic functors (Sect. 4.1), Regular Nondeterministic Nominal Automata (RNNA), an expressive automata notion living in nominal sets (Sect. 4.2), multisorted transition systems, used in Lasota's work to construct a coalgebra situation from an open map situation (Sect. 4.3).

*Notation.* We assume basic categorical knowledge and notation (see e.g. [1,3]). The cotupling of morphisms f : A <sup>→</sup> C, g : B <sup>→</sup> C is denoted by [f,g]: A+B <sup>→</sup> C, and the unique morphsim to the terminal object is !: X <sup>→</sup> 1 for every X.

# **2 Two Categorical Approaches for Bisimulations**

We introduce the two formalisms involved in the present paper: the open maps (Sect. 2.1) and the coalgebras (Sect. 2.2). Those formalisms will be illustrated on the classic example of Labelled Transition Systems (LTSs).

**Definition 2.1.** *Fix a set* A*, called the alphabet. A* labelled transition system *is a triple* (S, i, Δ) *with* S *a set of* states*,* i <sup>∈</sup> S *the* initial state*, and* Δ <sup>⊆</sup> S×A×S *the* transition relation*. When* <sup>Δ</sup> *is obvious from the context, we write* <sup>s</sup> <sup>a</sup> −→ s *to mean* (s, a, s ) <sup>∈</sup> Δ*.*

For instance, the tuple ({0, ··· , n}, <sup>0</sup>, {(<sup>k</sup> <sup>−</sup> <sup>1</sup>, a<sup>k</sup>, k) <sup>|</sup> <sup>1</sup> <sup>≤</sup> k <sup>≤</sup> n}) is an LTS, and called the *linear system* over the word <sup>a</sup><sup>1</sup> ··· <sup>a</sup><sup>n</sup> <sup>∈</sup> <sup>A</sup>-. To relate LTSs, one considers functions that preserves the structure of LTSs:

**Definition 2.2.** *<sup>A</sup>* morphism of LTSs *from* (S, i, Δ) *to* (S , i , Δ ) *is a function* f : S −→ S *such that* <sup>f</sup>(i) = <sup>i</sup> *and for every* (s, a, s ) <sup>∈</sup> Δ*,* (f(s), a, f(s )) <sup>∈</sup> Δ *. LTSs and morphisms of LTSs form a category, which we denote by* LTSA*.*

Some authors choose other notions of morphisms (e.g. [16]), allowing them to operate between LTSs with different alphabets for example. The usual way of comparing LTSs is by using simulations and bisimulations [23]. The former describes what it means for a system to have at least the behaviours of another, the latter describes that two systems have exactly the same behaviours. Concretely:

**Definition 2.3.** *<sup>A</sup>* simulation *from* (S, i, Δ) *to* (S , i , Δ ) *is a relation* R <sup>⊆</sup> <sup>S</sup> <sup>×</sup> <sup>S</sup> *such that (1)* (i, i ) <sup>∈</sup> R*, and (2) for every* s <sup>a</sup> −→ t *and* (s, s ) <sup>∈</sup> R*, there is* t <sup>∈</sup> S *such that* s <sup>a</sup> −→ t *and* (t, t ) <sup>∈</sup> R*. Such a relation* R *is a* bisimulation *if* R<sup>−</sup><sup>1</sup> <sup>=</sup> {(s , s) <sup>|</sup> (s, s ) <sup>∈</sup> R} *is also a simulation.*

Morphisms of LTSs are functional simulations, i.e. functions between states whose graph is a simulation. So how to model (1) systems, (2) functional simulations and (3) functional bisimulations categorically? In the next two sections, we will describe known answers to this question, with open maps and coalgebra. In both cases, it is possible to capture similarity and bisimilarity of two LTSs T

and T . Generally, a simulation is a (jointly monic) span of a functional bisimulation and a functional simulation, and a bisimulation is a simulation whose converse is also a simulation, as depicted in Table 1. Consequently, to understand similarity and bisimilarity on a general level, it is enough to understand functional simulations and bisimulations.

#### **2.1 Open Maps**

The categorical framework of open maps [16] assumes functional simulations to be already modeled as a category **M**. For example, for **M** := LTSA, objects are LTSs, and morphisms are functional simulations. Furthermore, the open maps framework assumes another category **P** of 'paths' or 'linear systems', together with a functor J that tells how a 'path' is to be understood as a system:

**Definition 2.4** [16]**.** *An* open map situation *is given by categories* **M** *('systems' with 'functional simulations') and* **<sup>P</sup>** *('paths') together with a functor* J : **<sup>P</sup>** <sup>→</sup> **<sup>M</sup>***.*

For example with **<sup>M</sup>** := LTSA, we pick **<sup>P</sup>** := (A-, <sup>≤</sup>) to be the poset of words over A with prefix order. Here, the functor J maps a word w <sup>∈</sup> A to the linear system over w, and w <sup>≤</sup> v to the evident functional simulation J(w <sup>≤</sup> v): Jw −→ Jv.

In an open map situation J : **<sup>P</sup>**−→**M**, we can abstractly represent the concept of a *run* in a system. A run of a path w <sup>∈</sup> **<sup>P</sup>** in a system T <sup>∈</sup> **<sup>M</sup>** is simply defined to be an **<sup>M</sup>**-morphism of type Jw −→ T. With this definition, each **<sup>M</sup>**-morphism h: T −→ T (i.e. functional simulation) inherently transfers runs: given a run x: Jw −→ T, the morphism h · x: Jw −→ T is a run of <sup>w</sup> in <sup>T</sup> . In the example open map situation J : (A-, <sup>≤</sup>) −→ LTSA, a run of a path <sup>w</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ··· <sup>a</sup><sup>n</sup> <sup>∈</sup> <sup>A</sup>- in an LTS <sup>T</sup> = (S, i, Δ) is nothing but a sequence of states <sup>x</sup>0,...,x<sup>n</sup> <sup>∈</sup> <sup>S</sup> such that <sup>x</sup><sup>0</sup> <sup>=</sup> <sup>i</sup> and <sup>x</sup><sup>k</sup>−<sup>1</sup> <sup>a</sup>*<sup>k</sup>* −→ <sup>x</sup><sup>k</sup> holds for all 1 <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>n</sup>.

We introduce the concept of open map [16]. This is an abstraction of the property posessed by *functional bisimulations*. For LTSs T = (S, i, Δ) and T <sup>=</sup> (S , i , Δ ), an LTSA-morphism h: T −→ T is a functional bisimulation if the graph of h is a bisimulation. This implies the following relationship between runs in T and runs in T . Suppose that w <sup>≤</sup> w holds in A-, and a run x of w in T is given as in (1); here n, m are lengths of w, w respectively. Then for any run y of <sup>w</sup> in <sup>T</sup> extending <sup>h</sup> · <sup>x</sup> as in (2), there is a run <sup>x</sup> of <sup>w</sup> extending <sup>x</sup>, and moreover its image by h coincides with y (that is, <sup>h</sup> · <sup>x</sup> <sup>=</sup> <sup>y</sup> ). Such x is obtained by repetitively applying the condition of functional bisimulation.

$$\begin{array}{c} \overbrace{i \xrightarrow{w\_1} x\_1 \xrightarrow{w\_2} \cdots \xrightarrow{w\_n} x\_n}^{x\_{n+1}} \overbrace{x\_{n+1} \xrightarrow{w'\_{n+2}} \cdots \cdots \xrightarrow{w'\_m} x'\_m}^{w'\_{n+2}} \quad \text{(in } T) \end{array} \quad \text{(1)}$$

$$\longrightarrow \underbrace{i' \xrightarrow{w\_1} h(x\_1) \xrightarrow{w\_2} \cdots \xrightarrow{w\_n} h(x\_n)}\_{y'} \xrightarrow{w'\_{n+1}} y'\_{n+1} \xrightarrow{w\_{n+2}} \cdots \xrightarrow{w'\_m} y'\_m \quad \text{(in } T' \text{)} \quad \text{(2)}$$

Observe that <sup>y</sup> extending h · x can be represented as y · J(w <sup>≤</sup> w ) = h · x, and x extending <sup>x</sup> as <sup>x</sup> · <sup>J</sup>(<sup>w</sup> <sup>≤</sup> <sup>w</sup> ) = x. From these, we conclude that if an

T-

y-

∃<sup>x</sup>-

x

LTSA-morphism h: T −→ T is a functional bisimulation, then for any w <sup>≤</sup> w in A and run x: Jw −→ T and y : Jw −→ T such that y · J(w <sup>≤</sup> w ) = h · x, there is a run x : Jw −→ <sup>T</sup> such that <sup>x</sup> · <sup>J</sup>(<sup>w</sup> <sup>≤</sup> <sup>w</sup> ) = x and h · x <sup>=</sup> y (the converse also holds if all states of T are reachable). This necessary condition of functional bisimulation can be rephrased in any open map situation, leading us to the definition of open map.

**Definition 2.5** [16]**.** *Let* J : **<sup>P</sup>**−→**<sup>M</sup>** *be an open map situation. An* **<sup>M</sup>***-morphism* <sup>h</sup>: <sup>T</sup> −→ <sup>T</sup> *is said to be* open *if for every morphism* Φ: w −→ w <sup>∈</sup> **<sup>P</sup>** *making the square on the right commute, there is* x *making the two triangles commute.* Jw T Jw-JΦ h

Open maps are closed under composition and stable under pullback [16].

#### **2.2 Coalgebras**

The theory of G-coalgebras is another categorical framework to study bisimulations. The type of systems is modelled using an endofunctor G: **<sup>C</sup>** −→ **<sup>C</sup>** and a system is then a coalgebra for this functor, that is, a pair of an object S of **<sup>C</sup>** (modeling the state space), and of a morphism of type S −→ GS (modeling the transitions). For example for LTSs, the transition relation is of type Δ <sup>⊆</sup> S×A×S. Equivalently, this can be defined as a function Δ: S −→ P(A×S), where P is the powerset. In other words, the transition relation is a coalgebra for the Set-functor <sup>P</sup>(A<sup>×</sup> ). Intuitively, this coalgebra gives the one-step behaviour of an LTS: S describes the state space of the system, <sup>P</sup> describes the 'branching type' as being non-deterministic, A <sup>×</sup> S describe the 'computation type' as being linear, and the function itself lists all possible futures after one-step of computation of the system. Now, changing the underlying category or the endofunctor allows to model different types of systems. This is the usual framework of coalgebra, as described for example in [25].

Initial states are modelled coalgebraically by a pointing to the carrier i: I−→ S for a fixed object I in **<sup>C</sup>**, describing the 'type of initial states' (see e.g. [2, Sec. 3B]). For example, an initial state of an LTS is the same as a function from the singleton set I := {∗} to the state space S. This object I will often be the final object of **<sup>C</sup>**, but we will see other examples later. In total, an I*-pointed* G*coalgebra* is a **<sup>C</sup>**-object S together with morphisms α: S −→ GS and i: I −→ S. E.g. an LTS is an I-pointed G-coalgebra for I <sup>=</sup> {∗} and GX <sup>=</sup> <sup>P</sup>(A <sup>×</sup> X).

In coalgebra, functional bisimulations are the first class citizens to be modelled as homomorphisms. The intuition is that those preserve the initial state, and preserve and reflect the one-step relation.

**Definition 2.6.** *An* I-pointed G-coalgebra homomorphism *from* I <sup>i</sup> −→ S <sup>α</sup> −→ GS *to* I <sup>i</sup> - −→ S <sup>α</sup>- −−→ GS *is a morphism* f : S −→ S *making the right-hand diagram commute.* I S GS S- GS- i i - α f Gf α-

For instance, when G <sup>=</sup> <sup>P</sup>(A <sup>×</sup> ), one can easily see that a function f is a G-coalgebra homomorphism iff it is a functional bisimulation. Thus, if we want to capture functional simulations in LTSs, we need to weaken the condition of homomorphism to the inequality Gf(α(s)) <sup>⊆</sup> α (f(s)) (instead of equality). To express this condition for general G-coalgebras, we introduce a partial order X,Y on each homset **<sup>C</sup>**(X, GY ) in a functorial manner.

**Definition 2.7.** *<sup>A</sup>* partial order on G-homsets *is a functor* : **<sup>C</sup>**op <sup>×</sup> **<sup>C</sup>** −→ Pos *such that* <sup>U</sup> · <sup>=</sup> **<sup>C</sup>**( , G )*; here,* U : Pos −→ Set *is the forgetful functor from the category* Pos *of posets and monotone functions.*

The functoriality of amounts to that <sup>f</sup><sup>1</sup> <sup>f</sup><sup>2</sup> implies Gh · <sup>f</sup><sup>1</sup> · <sup>g</sup> Gh · <sup>f</sup><sup>2</sup> · <sup>g</sup>.

**Definition 2.8.** *Given a partial order on* G*-homsets, an* I-pointed lax G-coalgebra homomorphism f : (S, α, i) −→ (S , α , i ) *is a morphism* f : S−→S *making the right-hand diagram commute. The* I*-pointed* G*-coalgebras and lax homomorphisms form a category, denoted by* Coalgl(I,G)*.* I S GS S- GS- i i - α f - Gf α-

**Conclusion 2.9.** In Set, with I <sup>=</sup> {∗}, G <sup>=</sup> <sup>P</sup>(A<sup>×</sup> ), define the order f g in Set(X,P(A×Y )) iff for every x <sup>∈</sup> X, f(x) <sup>⊆</sup> g(x). Then Coalgl({∗},P(A<sup>×</sup> )) = LTSA. In particular, we have an open map situation

$$\mathbb{P} = (A^\star, \le) \quad \xrightarrow{J} \quad \mathbb{M} = \mathsf{LTS}\_A = \mathsf{Coalg}\_l(\{\ast\}, \mathcal{P}(A \times \bot))$$

and the open maps are precisely the coalgebra homomorphisms (for reachable LTSs). In this paper, we will construct a path category **<sup>P</sup>** for more general I and G, such that the open morphisms are precisely the coalgebra homomorphisms.

# **3 The Open Map Situation in Coalgebras**

Lasota's construction [19] transforms an open map situation J : **<sup>P</sup>** −→ **<sup>M</sup>** into a functor G (with a partial order on G-homsets), together with a functor Beh: **<sup>M</sup>** −→ Coalgl(I,G) that sends open maps to <sup>G</sup>-coalgebra homomorphisms (see Sect. 4.3 for details). In this paper, we provide a construction in the converse direction for functors G of a certain shape.

As exemplified by LTSs, it is a common pattern that G is the composition G <sup>=</sup> T F of two functors [12], where T is the branching type (e.g. partial, or non-deterministic) and F is the data type, or the 'linear behaviour' (words, trees, words modulo α-equivalence). If we instantiate our path-construction to T <sup>=</sup> <sup>P</sup> and F <sup>=</sup> A <sup>×</sup> , we obtain the known open map situation for LTSs (Conclusion 2.9).

Fix a category **<sup>C</sup>** with pullbacks, functors T,F : **<sup>C</sup>** −→ **<sup>C</sup>**, an object I <sup>∈</sup> **<sup>C</sup>** and a partial order <sup>T</sup> on <sup>T</sup>-homsets. They determine a coalgebra situation (**C**, I, TF, ) where is the partial order on T F-homsets defined by X,Y <sup>=</sup> T X,F Y . Under some conditions on <sup>T</sup> and <sup>F</sup>, we construct a path-category Path(I,F + 1) and an open map situation Path(I,F + 1) <sup>→</sup> Coalgl(I,TF) where T F-coalgebra homomorphisms and Path(I,F + 1)-open morphisms coincide.

**Fig. 1.** A non-precise map *f* that factors through the *F*-precise *f*- : *X* −→*Y* - ×*Y* - +{⊥}

#### **3.1 Precise Morphisms**

While the path category is intuitively clear for F X <sup>=</sup> A <sup>×</sup> X, it is not for inner functors F that model tree languages. For example for F X <sup>=</sup> A+X <sup>×</sup>X, a <sup>P</sup>Fcoalgebra models transition systems over binary trees with leaves labelled in A, instead of over words. Hence, the paths should be these kind of binary trees. We capture the notion of tree like shape ("every node in a tree has precisely one route to the root") by the following abstract definition:

**Definition 3.1.** *For a functor* F : **<sup>C</sup>** −→ **<sup>C</sup>***, a morphism* s: S −→ F R *is called* F-precise *if for all* f, g, h *the following implication holds:*

$$\begin{array}{ccccc} S \stackrel{f}{\longrightarrow} FC & \stackrel{\exists d}{\Longrightarrow} & S \stackrel{f}{\longrightarrow} FC \\\ ^{s}\_{FR} \stackrel{Fg}{\Longrightarrow} FD & & FR \end{array} \quad \begin{array}{c} S \stackrel{f}{\longrightarrow} FC \\\ ^{s}\_{Fd} \stackrel{\gamma}{\longrightarrow} Fd & & \stackrel{d}{R} \stackrel{f}{\longrightarrow} D \\\ ^{R}\_{R} \stackrel{g}{\longrightarrow} D \end{array}$$

*Remark 3.2.* If F preserves weak pullbacks, then a morphism s is F-precise iff it fulfils the above definition for g = id.

*Example 3.3.* Intuitively speaking, for a polynomial Set-functor F, a map s: S <sup>→</sup> F R is F-precise iff every element of R is mentioned precisely once in the definition of the map f. For example, for F X <sup>=</sup> A <sup>×</sup> X <sup>+</sup> {⊥}, the case needed later for LTSs, a map f : X −→ F Y is precise iff for every y <sup>∈</sup> Y , there is a unique pair (x, a) <sup>∈</sup> X <sup>×</sup> A such that f(x)=(a, y). For F X <sup>=</sup> X <sup>×</sup> X <sup>+</sup> {⊥} on Set, the map <sup>f</sup> : <sup>X</sup> −→F Y in Fig. <sup>1</sup> is not <sup>F</sup>-precise, because <sup>y</sup><sup>2</sup> is used three times (once in <sup>f</sup>(x<sup>2</sup>) and twice in <sup>f</sup>(x<sup>3</sup>)), and <sup>y</sup><sup>3</sup> and <sup>y</sup><sup>4</sup> do not occur in <sup>f</sup> at all. However, f : <sup>X</sup> −→ F Y is <sup>F</sup>-precise because every element of <sup>Y</sup> is used precisely once in f , and we have that F h · f <sup>=</sup> f. Also note that f defines a forest where X is the set of roots, which is closely connected to the intuition that, in the F-precise map f , from every element of Y , there is precisely one edge up to a root in X.

So when transforming a non-precise map into a precise map, one duplicates elements that are used multiple times and drops elements that are not used. We will cover functors F for which this factorization pattern provides F-precise maps. If F involves unordered structure, this factorization needs to make choices, and so we restrict the factorization to a class S of objects that have that choiceprinciple (see Example 4.5 later):

S FY -

∃f-

∀f

F Y

F h

**Definition 3.4.** *Fix a class of objects* S ⊆ **obj C** *closed under isomorphism. We say that* F admits precise factorizations w.r.t. <sup>S</sup> *if for every* f : S <sup>→</sup> F Y *with* S ∈ S*, there exist* Y ∈ S*,* h: Y <sup>→</sup> <sup>Y</sup> *and* <sup>f</sup> : <sup>S</sup> <sup>→</sup> F Y <sup>F</sup>*-precise with* F h · <sup>f</sup> <sup>=</sup> <sup>f</sup>*.*

**Fig. 2.** A path of length 4 for *F X* = {*a*} × *X* + *X* × *X* + {⊥} with *I* = {∗}.

For **C** = Set, S contains all sets. However for the category of nominal sets, S will only contain the strong nominal sets (see details in Subsect. 4.2).

*Remark 3.5.* Precise morphisms are essentially unique. If <sup>f</sup><sup>1</sup> : <sup>X</sup> −→ F Y<sup>1</sup> and <sup>f</sup><sup>2</sup> : <sup>X</sup> −→F Y<sup>2</sup> are <sup>F</sup>-precise and if there is some <sup>h</sup>: <sup>Y</sup><sup>1</sup> −→Y<sup>2</sup> with F h · <sup>f</sup><sup>1</sup> <sup>=</sup> <sup>f</sup><sup>2</sup>, then h is an isomorphism. Consequently, if f : S −→F Y with S ∈ S is F-precise and F-admits precise factorizations, then Y ∈ S.

Functors admitting precise factorizations are closed under basic constructions:

**Proposition 3.6.** *The following functors admit precise factorizations w.r.t.* S*:*


*Example 3.7.* When **C** is infinitary extensive and S is closed under coproducts, every polynomial endofunctor F : **<sup>C</sup>** −→ **<sup>C</sup>** admits precise factorizations w.r.t. <sup>S</sup>. This is in particular the case for **C** = S = Set. In this case, we shall see later (Sect. 4.1) that many other Set-functors, e.g. the bag functor <sup>B</sup>, where <sup>B</sup>(X) is the set of finite multisets, have precise factorizations. In contrast, F <sup>=</sup> <sup>P</sup> does not admit precise factorizations, and if f : X −→ PY is <sup>P</sup>-precise, then f(x) = <sup>∅</sup> for all x <sup>∈</sup> X.

#### **3.2 Path Categories in Pointed Coalgebras**

We define a path for I-pointed T F-coalgebras as a tree according to F. Following the observation in Example 3.3, one layer of the tree is modelled by a F-precise morphism and hence a path in a T F-coalgebra is defined to be a finite sequence of (<sup>F</sup> + 1)-precise maps, where the + 1 comes from the dead states w.r.t. T; the argument is given later in Remark 3.23 when reachability is discussed. Since the + 1 is not relevant yet, we define Path(I,F) in the following and will use Path(I,F + 1) later. For simplicity, we write *<sup>X</sup>*<sup>n</sup> for finite families (Xk)<sup>0</sup>≤k<n.

**Definition 3.8.** *The category* Path(I,F) *consists of the following. An object is* (*<sup>P</sup>* <sup>n</sup>+1, *<sup>p</sup>*n) *for an* <sup>n</sup> <sup>∈</sup> <sup>N</sup> *with* <sup>P</sup><sup>0</sup> <sup>=</sup> <sup>I</sup> *and <sup>p</sup>*<sup>n</sup> *a family of* <sup>F</sup>*-precise maps* (p<sup>k</sup> : <sup>P</sup><sup>k</sup> −→ F P<sup>k</sup>+1)k<n*. We say that* (*<sup>P</sup>* <sup>n</sup>+1, *<sup>p</sup>*n) *is a* path of length <sup>n</sup>*. A morphism <sup>φ</sup>*n+1 : (*<sup>P</sup>* <sup>n</sup>+1, *<sup>p</sup>*n)−→(*Q*m+1, *<sup>q</sup>*m)*,* <sup>m</sup> <sup>≥</sup> <sup>n</sup>*, is a family* (φ<sup>k</sup> : <sup>P</sup><sup>k</sup>−→Q<sup>k</sup>)<sup>k</sup>≤<sup>n</sup> *with* <sup>φ</sup><sup>0</sup> = id<sup>I</sup> *and* <sup>q</sup><sup>k</sup> · <sup>φ</sup><sup>k</sup> <sup>=</sup> F φ<sup>k</sup>+1 · <sup>p</sup><sup>k</sup> *for all* <sup>0</sup> <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>n</sup>*.*

*Example 3.9.* Paths for F X <sup>=</sup> A <sup>×</sup> X + 1 and I <sup>=</sup> {∗} singleton are as follows. First, a map f : I −→ F X is precise iff (up-to isomorphism) either X <sup>=</sup> I and f(∗)=(a, <sup>∗</sup>) for some a <sup>∈</sup> A; or X <sup>=</sup> <sup>∅</sup> and f(∗) = <sup>⊥</sup>. Then a path is isomorphic to an object of the form: <sup>P</sup><sup>i</sup> <sup>=</sup> <sup>I</sup> for <sup>i</sup> <sup>≤</sup> <sup>k</sup>, <sup>P</sup><sup>i</sup> <sup>=</sup> <sup>∅</sup> for i>k, <sup>p</sup><sup>i</sup>(∗)=(a<sup>i</sup>, <sup>∗</sup>) for i<k, and p<sup>k</sup>(∗) = <sup>⊥</sup>. A path is the same as a word, plus some "junk", concretely, a word in A-.⊥-. For LTSs, an object in Path(I,F) with F X <sup>=</sup> A <sup>×</sup> X is simply a word in A-. For a more complicated functor, Fig. 2 depicts a path of length 4, which is a tree for the signature with one unary, one binary symbol, and a constant. The layers of the tree are the sets *<sup>P</sup>* <sup>4</sup>. Also note that since every <sup>p</sup><sup>i</sup> is <sup>F</sup>-precise, there is precisely one route to go from every element of a <sup>P</sup><sup>k</sup> to <sup>∗</sup>.

*Remark 3.10.* The inductive continuation of Remark 3.5 is as follows. Given a morphism *<sup>φ</sup>*n+1 in Path(I,F), since <sup>φ</sup><sup>0</sup> is an isomorphism, then <sup>φ</sup><sup>k</sup> is an isomorphism for all 0 <sup>≤</sup> k <sup>≤</sup> n. If F admits precise factorizations and if I ∈ S, then for every path (*<sup>P</sup>* <sup>n</sup>+1, *<sup>p</sup>*n), all <sup>P</sup><sup>k</sup>, 0 <sup>≤</sup> <sup>k</sup> <sup>≤</sup> <sup>n</sup>, are in <sup>S</sup>.

*Remark 3.11.* If in Definition 3.4, the connecting morphism h: Y −→Y uniquely exists, then it follows by induction that the hom-sets of Path(I,F) are at most singleton. This is the case for all polynomial functors, but not the case for the bag functor on sets (discussed in Subsect. 4.1).

**Definition 3.12.** *The* path poset PathOrd(I,F) *is the set* <sup>0</sup>≤<sup>n</sup> **<sup>C</sup>**(I,F <sup>n</sup>1) *equipped with the order: for* <sup>u</sup>: <sup>I</sup> −→ <sup>F</sup> <sup>n</sup><sup>1</sup> *and* v : I −→ F <sup>m</sup>1*, we define* u <sup>≤</sup> v *if* n <sup>≤</sup> m *and* F <sup>n</sup>(!) · v <sup>=</sup> u*.* F <sup>n</sup>F <sup>m</sup>−<sup>n</sup>1 I F <sup>n</sup>1 <sup>F</sup> *<sup>n</sup>*! <sup>v</sup> u

So u <sup>≤</sup> v if u is the truncation of v to n levels. This matches the morphisms in Path(I,F) that witnesses that one path is prefix of another:

**Proposition 3.13.** *1. The functor* Comp: Path(I,F)−→PathOrd(I,F) *defined by* <sup>I</sup> <sup>=</sup> <sup>P</sup><sup>0</sup> p<sup>0</sup> <sup>→</sup> F P<sup>1</sup> ···→ <sup>F</sup> <sup>n</sup>P<sup>n</sup> <sup>F</sup> *<sup>n</sup>*! <sup>→</sup> F <sup>n</sup><sup>1</sup> *on* (*<sup>P</sup>* <sup>n</sup>+1, *<sup>p</sup>*n) *is full, and reflects isos. 2. If* F *admits precise factorizations w.r.t.* <sup>S</sup> *and* I ∈ S*, then* Comp *is sujective. 3. If additionally* h *in Definition 3.4 is unique, then* Comp *has a right-inverse.*

In particular, PathOrd(I,F) is Path(I,F) up to isomorphism. In the instances, it is often easier to characterize PathOrd(I,F). This also shows that Path(I,F) contains the elements – understood as morphisms from I – of the finite start of the final chain of F: 1 ! ←− F<sup>1</sup> <sup>F</sup> ! ←− F<sup>2</sup><sup>1</sup> <sup>F</sup> <sup>2</sup>! ←−− F<sup>3</sup><sup>1</sup> ←−··· .

*Example 3.14.* When F X <sup>=</sup> A <sup>×</sup> X + 1, F <sup>n</sup>1 is isomorphic to the set of words in A-.⊥ of length n. Consequently, PathOrd(I,F) is the set of words in A-.⊥-, equipped with the prefix order. In this case, Comp is an equivalence of categories.

#### **3.3 Embedding Paths into Pointed Coalgebras**

The paths (*<sup>P</sup>* <sup>n</sup>+1, *<sup>p</sup>*n) embed into Coalgl(I,TF) as one expects it for examples like Fig. 2: one takes the disjoint union of the <sup>P</sup><sup>k</sup>, one has the pointing <sup>I</sup> <sup>=</sup> <sup>P</sup><sup>0</sup> and the linear structure of F is embedded into the branching type T.

During the presentation of the results, we require T, F, and I to have certain properties, which will be introduced one after the other. The full list of assumptions is summarized in Table 2:

(Ax1) – The main theorem will show that coalgebra homomorphisms in Coalgl(I,TF) are the open maps for the path category Path(I,F + 1). So from now on, we assume that **C** has finite coproducts and to use the results from the previous sections, we fix a class S ⊆ **obj <sup>C</sup>** such that F + 1 admits precise factorizations w.r.t. <sup>S</sup> and that I ∈ S.

(Ax2) – Recall, that a family of morphisms (e<sup>i</sup> : <sup>X</sup><sup>i</sup> −→ <sup>Y</sup> )<sup>i</sup>∈<sup>I</sup> with common codomain is called jointly epic if for f,g : <sup>Y</sup> −→<sup>Z</sup> we have that <sup>f</sup> ·e<sup>i</sup> <sup>=</sup> <sup>g</sup>·e<sup>i</sup> <sup>∀</sup><sup>i</sup> <sup>∈</sup> <sup>I</sup> implies f <sup>=</sup> g. For Set, this means, that every element y <sup>∈</sup> Y is in the image of some e<sup>i</sup>. Since we work with partial orders on <sup>T</sup>-homsets, we also need the generalization of this property if f g are of the form Y −→ T Z .

(Ax3) – In this section, we encode paths as a pointed coalgebra by constructing a functor J : Path(I,F + 1) <sup>→</sup> Coalgl(I,TF). For that we need to embed the linear behaviour F X + 1 into TFX. This is done by a natural transformation [η, <sup>⊥</sup>]: Id +1 −→ T, and we require that <sup>⊥</sup>: 1 −→ T is a bottom element for .

*Example 3.15.* For the case where T is the powerset functor <sup>P</sup>, η is given by the unit η<sup>X</sup>(x) = {x}, and <sup>⊥</sup> is given by empty sets <sup>⊥</sup>X(∗) = <sup>∅</sup>.

**Definition 3.16.** *We have an inclusion functor* J : Path(I,F + 1) <sup>→</sup> Coalg<sup>l</sup> (I,TF) *that maps a path* (*<sup>P</sup>* <sup>n</sup>+1, *<sup>p</sup>*n) *to an* I*-pointed* T F*-coalgebra on <sup>P</sup>* <sup>n</sup>+1 := <sup>0</sup>≤k≤<sup>n</sup> <sup>P</sup><sup>k</sup>*. The pointing is given by* in<sup>0</sup> : <sup>I</sup> <sup>=</sup> <sup>P</sup><sup>0</sup> −→ *<sup>P</sup>* <sup>n</sup>+1 *and the structure by:*

$$\coprod\_{0 \le k < n} P\_k + P\_n \xrightarrow{[(F \mathbb{ia}\_{k+1} + 1) \cdot p\_k]\_{0 \le k < n} + !} F \coprod P\_{n+1} + 1 \xrightarrow{[\eta, \perp]} TF \coprod P\_{n+1}.$$

*Example 3.17.* In the case of LTSs, a path, or equivalently a word <sup>a</sup><sup>1</sup>...a<sup>k</sup>.⊥...⊥ ∈ A-.⊥-, is mapped to the finite linear system over <sup>a</sup><sup>1</sup>...a<sup>k</sup> (see Sect. 2.1), seen as a coalgebra (see Sect. 2.2).

**Proposition 3.18.** *Given a morphism* [xk]k≤<sup>n</sup> : *<sup>P</sup>* <sup>n</sup>+1−→<sup>X</sup> *for some system* (X, ξ, x<sup>0</sup>) *and a path* (*<sup>P</sup>* <sup>n</sup>+1, *<sup>p</sup>*n)*, we have*

$$\begin{array}{c} J(\mathsf{P}\_{n+1}, \mathsf{p}\_{n}) \xrightarrow{[x\_{k}]\_{k \leq n}} (X, \xi, x\_{0}) \iff \forall k < n : \begin{array}{c} P\_{k} \xrightarrow{x\_{k}} X \\ \Box\_{Fx\_{k+1}+1} \sqcap\_{Fx\_{k+1}+1} \sqsubseteq \Box\_{[\eta, \perp]\_{X}} \end{array} \begin{array}{c} X \\ \xi \\ F P\_{k+1}+1 \xrightarrow{} F X+1 \xrightarrow{} T F X. \end{array} \end{array}$$

Also note that the pointing <sup>x</sup><sup>0</sup> of the coalgebra is necessarily the first component of any run in it. In a run [x<sup>k</sup>]<sup>k</sup>≤<sup>n</sup>, <sup>p</sup><sup>k</sup> corresponds to an edge from <sup>x</sup><sup>k</sup> to <sup>x</sup><sup>k</sup>+1.

*Example 3.19.* For LTSs, since the <sup>P</sup><sup>k</sup> are singletons, <sup>x</sup><sup>k</sup> just picks the <sup>k</sup>th state of the run. The right-hand side of this lemma describes that this is a run iff there is a transition from the kth state and the (k + 1)−th state.

#### **3.4 Open Morphisms Are Exactly Coalgebra Homomorphisms**

In this section, we prove our main contribution, namely that Path(I,F + 1) open maps in Coalgl(I,TF) are exactly coalgebra homomorphisms. For the first direction of the main theorem, that is, that coalgebra homomorphisms are open, we need two extra axioms:

(Ax4) – describing that the order on **<sup>C</sup>**(X, T Y ) is point-wise. This holds for the powerset because every set is the union of its singleton subsets.

(Ax5) – describing that **<sup>C</sup>**(X, T Y ) admits a choice-principle. This holds for the powerset because whenever y <sup>∈</sup> h[x] for a map h: X −→ Y and x <sup>⊆</sup> X, then there is some {x } ⊆ x with h(x ) = y.

**Theorem 3.20.** *Under the assumptions of Table 2, a coalgebra homomorphism in* Coalgl(I,TF) *is* Path(I,F + 1)*-open.*



The converse is not true in general, because intuitively, open maps reflect runs, and thus only reflect edges of reachable states, as we have seen in Sect. 2.1. The notion of a state being reached by a path is the following:

**Definition 3.21.** *A system* (X, ξ, x<sup>0</sup>) *is* path-reachable *if the family of runs* [xk]k≤<sup>n</sup> : <sup>J</sup>(*<sup>P</sup>* <sup>n</sup>+1, *<sup>p</sup>*n)−→(X, ξ, x<sup>0</sup>) *(of paths from* Path(I,F + 1)*) is jointly epic.*

*Example 3.22.* For LTSs, this means that every state in X is reached by a run, that is, there is a path from the initial state to every state of X.

*Remark 3.23.* In Definition 3.21, it is crucial that we consider Path(I,F +1) and not Path(I,F) for functors incorporating 'arities <sup>≥</sup> 2'. This does not affect the example of LTSs, but for I = 1, F X <sup>=</sup> X <sup>×</sup> X and T <sup>=</sup> <sup>P</sup> in Set, the coalgebra (X, ξ, x<sup>0</sup>) on X <sup>=</sup> {x0, y1, y2, z1, z<sup>2</sup>} given by ξ(x<sup>0</sup>) = {(y1, y<sup>2</sup>)}, ξ(y<sup>1</sup>) = {(z1, z<sup>2</sup>)}, ξ(y<sup>2</sup>) = ξ(z<sup>1</sup>) = ξ(z<sup>2</sup>) = <sup>∅</sup> is path-reachable for Path(I,F + 1). There is no run of a length 2 path from Path(I,F), because <sup>y</sup><sup>2</sup> has no successors, and so there is no path to <sup>z</sup><sup>1</sup> or to <sup>z</sup><sup>2</sup>.

**Theorem 3.24.** *Under the assumptions of Table 2, if* (X, ξ, x<sup>0</sup>) *is pathreachable, then an open morphism* h: (X, ξ, x<sup>0</sup>) −→ (Y, ζ,y<sup>0</sup>) *is a coalgebra homomorphism.*

#### **3.5 Connection to Other Notions of Reachability**

There is another concise notion for reachability in the coalgebraic literature [2].

**Definition 3.25.** *<sup>A</sup>* subcoalgebra *of* (X, ξ, x<sup>0</sup>) *is a coalgebra homomorphism* h: (Y, ζ,y<sup>0</sup>) −→ (X, ξ, x<sup>0</sup>) *that is carried by a monomorphism* h: X - Y *. Furthermore* (X, ξ, x<sup>0</sup>) *is called* reachable *if it has no proper subcoalgebra, i.e. if any subcoalgebra* h *is an isomorphism.*

Under the following assumptions, this notion coincides with the path-based definition of reachability (Definition 3.21).

**Assumption 3.26.** For the present Subsect. 3.5, let **C** be cocomplete, have (epi,mono)-factorizations and wide pullbacks of monomorphisms.

The first direction follows directly from Theorem 3.20:

**Proposition 3.27.** *Every path-reachable* (X, ξ, x<sup>0</sup>) *has no proper subcoalgebra.*

For the other direction it is needed that T F preserves arbitrary intersections, that is, wide pullbacks of monomorphisms. In Set, this means that for a family (X<sup>i</sup> <sup>⊆</sup> <sup>Y</sup> )<sup>i</sup>∈<sup>I</sup> of subsets we have <sup>i</sup>∈<sup>I</sup> TFX<sup>i</sup> <sup>=</sup> T F <sup>i</sup>∈<sup>I</sup> <sup>X</sup><sup>i</sup> as subsets of TFY .

**Proposition 3.28.** *If, furthermore, for every monomorphism* m: Y −→ Z*, the function* **<sup>C</sup>**(−,Tm): **<sup>C</sup>**(X, T Y ) −→ **<sup>C</sup>**(X, TZ) *reflects joins and if* T F *preserves arbitrary intersections, then a reachable coalgebra* (X, ξ, x<sup>0</sup>) *is also pathreachable.*

All those technical assumptions are satisfied in the case of LTSs, and will also be satisfied in all our instances in Sect. 4.

#### **3.6 Trace Semantics for Pointed Coalgebras**

The characterization from Theorems 3.20 and 3.24 points out a natural way of defining a trace semantics for pointed coalgebras. Indeed, the paths category Path(I,F +1) provides a natural way of defining the runs of a system. A possible way to go from runs to trace semantics is to describe accepting runs as the subcategory J : Path(I,F) <sup>→</sup> Path(I,F + 1). We can define the *trace semantics* of a system (X, ξ, xo) as the set:

$$\begin{aligned} \mathsf{tr}(X,\xi,x\_0) = \{ \mathsf{Comp}(\mathsf{P}\_{n+1},\mathsf{p}\_n) \,|\,\exists \,\mathsf{run}\,\,[x\_k]\_{k\le n} \colon JJ'(\mathsf{P}\_{n+1},\mathsf{p}\_n) \longrightarrow (X,\xi,x\_0) \} \\ \text{with } (\mathsf{P}\_{n+1},\mathsf{p}\_n) \in \mathsf{Path}(I,F) \} \end{aligned}$$

Since Path(I,F)-open maps preserve and reflect runs, we have the following:

**Corollary 3.29.** tr: Coalgl(I,TF)−→(P(PathOrd(I,F)), <sup>⊆</sup>) *is a functor and if* f : (X, ξ, x<sup>0</sup>) −→ (Y, ζ,y<sup>0</sup>) *is* Path(I,F + 1)*-open, then* tr(X, ξ, x<sup>0</sup>) = tr(Y, ζ,y<sup>0</sup>)*.*

Let us look at two LTS-related examples (we will describe some others in the next section). First, for F X <sup>=</sup> A <sup>×</sup> X. The usual trace semantics is given by all the words in A that are labelled of a run of a system. This trace semantics is obtained because PathOrd(I,F) = <sup>n</sup>≥<sup>0</sup> <sup>A</sup><sup>n</sup> and because Comp maps every path to its underlying word. Another example is given for F X <sup>=</sup> A <sup>×</sup> X <sup>+</sup> {}, where marks final states. In this case, a path in Path(I,F) of length n is either a path that can still be extended or encodes less than n steps to an accepting state . This obtains the trace semantics containing the set of accepted words, as in automata theory, plus the set of possibly infinite runs.

#### **4 Instances**

#### **4.1 Analytic Functors and Tree Automata**

In Example 3.7, we have seen that every polynomial Set-functors, in particular the functor X → A <sup>×</sup> X, has precise factorizations with respect to all sets. This allowed us to see LTSs, modelled as {∗}-pointed <sup>P</sup>(A <sup>×</sup> )-coalgebra, as an instance of our theory. This allowed us in particular to describe their trace semantics using our path category in Sect. 3.6. This can be extended to tree automata as follows. Assume given a signature <sup>Σ</sup>, that is, a collection (Σ<sup>n</sup>)<sup>n</sup>∈<sup>N</sup> of disjoint sets. When σ belongs to Σ<sup>n</sup>, we say that n is the *arity of* σ or that σ is a *symbol of arity* n. A top-down non-deterministic tree automata as defined in [6] is then the same as a {∗}-pointed <sup>P</sup>F-coalgebra where F is the polynomial functor X → <sup>σ</sup>∈Σ*<sup>n</sup>* <sup>X</sup><sup>n</sup>. For this functor, <sup>F</sup> <sup>n</sup>(1) is the set of trees over Σ{∗(0)} of depth at most n+1 such that a leaf is labelled by <sup>∗</sup> if and only if it is at depth n + 1. Intuitively, elements of F <sup>n</sup>(1) are partial runs of length n that can possibly be extended. Then, the trace semantics of a tree automata, seen as a pointed coalgebra, is given by the set of partial runs of the automata. In particular, this contains the set of accepted finite trees as those partial runs without any ∗, and the set of accepted infinite trees, encoded as the sequence of their truncations of depth n, for every n.

In the following, we would like to extend this to other kinds of tree automata by allowing some symmetries. For example, in a tree, we may not care about the order of the children. This boils down to quotient the set X<sup>n</sup> of <sup>n</sup>-tuples, by some permutations of the indices. This can be done generally given a subgroup <sup>G</sup> of the permutation group <sup>S</sup><sup>n</sup> on <sup>n</sup> elements by defining <sup>X</sup>n/G as the quotient of <sup>X</sup><sup>n</sup> under the equivalence relation: (x<sup>1</sup>,...,xn) <sup>≡</sup><sup>G</sup> (y<sup>1</sup>,...,yn) iff there is <sup>π</sup> <sup>∈</sup> <sup>G</sup> such that for all <sup>i</sup>, <sup>x</sup><sup>i</sup> <sup>=</sup> <sup>y</sup><sup>π</sup>(i). Concretely, this means that we replace the polynomial functor F by a so-called *analytic functor* :

 **Definition 4.1** [14,15]**.** *An* analytic Set*-functor is a functor of the form* F X <sup>=</sup> <sup>σ</sup>∈Σ*<sup>n</sup>* <sup>X</sup><sup>n</sup>/G<sup>σ</sup> *where for every* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>n</sup>*, we have a subgroup* <sup>G</sup><sup>σ</sup> *of the permutation group* <sup>S</sup><sup>n</sup> *on* <sup>n</sup> *elements.*

*Example 4.2.* Every polynomial functor is analytic. The bag-functor is analytic, with <sup>Σ</sup> = ({∗})<sup>n</sup>∈<sup>N</sup> has one operation symbol per arity and <sup>G</sup><sup>σ</sup> <sup>=</sup> <sup>S</sup>ar(σ) is the full permutation group on ar(σ) elements. It is the archetype of an analytic functor, in the sense that for every analytic functor F : Set −→ Set, there is a natural transformation into the bag functor α: F −→ B. If F is given by Σ and <sup>G</sup><sup>σ</sup> as above, then <sup>α</sup><sup>X</sup> is given by

$$FX = \coprod\_{\sigma \in \Sigma\_n} X^n / G\_{\sigma} \twoheadrightarrow \coprod\_{\sigma \in \Sigma\_n} X^n / \mathfrak{S}\_n \to \coprod\_{n \in \mathbb{N}} X^n / \mathfrak{S}\_n = \mathcal{B}X.$$

**Proposition 4.3.** *For an analytic* Set*-functor* F*, the following are equivalent (1) a map* <sup>f</sup> : <sup>X</sup> −→F Y *is* <sup>F</sup>*-precise, (2)* <sup>α</sup><sup>Y</sup> ·<sup>f</sup> *is* <sup>B</sup>*-precise, (3) every element of* Y *appears precisely once in the definition of* f*, i.e. for every* y <sup>∈</sup> Y *, there is exactly one* x *in* X*, such that* f(x) *is the equivalence class of a tuple* (y1,...,y<sup>n</sup>) *where there is an index* <sup>i</sup>*, such that* <sup>y</sup><sup>i</sup> <sup>=</sup> <sup>y</sup>*; and furthermore this index is unique. So every analytic functor has precise factorizations w.r.t.* Set*.*

#### **4.2 Nominal Sets: Regular Nondeterministic Nominal Automata**

We derive an open map situation from the coalgebraic situation for *regular nondeterministic nominal automata* (*RNNAs*) [26]. They are an extension of automata to accept *words with binders*, consisting of literals a <sup>∈</sup> **<sup>A</sup>** and binders <sup>|</sup><sup>a</sup> for <sup>a</sup> <sup>∈</sup> **<sup>A</sup>**; the latter is counted as length 1. An example of such a word of length 4 is a|<sup>c</sup>bc, where the last <sup>c</sup> is bound by <sup>|</sup>c. The order of binders makes difference: <sup>|</sup>a|<sup>b</sup>ab <sup>=</sup> <sup>|</sup>a|<sup>b</sup>ba. RNNAs are coalgebraically represented in the category of nominal sets [10], a formalism about atoms (e.g. variables) that sit in more complex structures (e.g. lambda terms), and gives a notion of *binding*. Because the choice principles (Ax4) and (Ax5) are not satisfied by every nominal sets, we instead use the class of *strong nominal sets* for the precise factorization (Definition 3.4).

**Definition 4.4** [10,24]**.** *Fix a countably infinite set* **A***, called the set of* atoms*. For the group* <sup>S</sup>f(**A**) *of finite permutations on the set* **<sup>A</sup>***, a* group action (X, ·) *is a set* X *together with a group homomorphism* ·: <sup>S</sup>f(**A**) −→ <sup>S</sup>f(X)*, written in* *infix notation. An element* x <sup>∈</sup> X *is* supported by S <sup>⊆</sup> **<sup>A</sup>***, if for all* π <sup>∈</sup> <sup>S</sup>f(**A**) *with* π(a) = a <sup>∀</sup>a <sup>∈</sup> S *we have* π · x <sup>=</sup> x*. A* nominal set *is a group action for* <sup>S</sup>f(**A**) *such that every* <sup>x</sup> <sup>∈</sup> <sup>X</sup> *is finitely supported, i.e. supported by a finite* S <sup>⊆</sup> **<sup>A</sup>***. A map* f : (X, ·) −→ (Y, ) *is* equivariant *if for all* x <sup>∈</sup> X *and* π <sup>∈</sup> <sup>S</sup>f(**A**) *we have* f(π · x) = π
f(x)*. The category of nominal sets and equivariant maps is denoted by* Nom*. A nominal set* (X, ·) *is called* strong *if for all* x <sup>∈</sup> X *and* π <sup>∈</sup> <sup>S</sup>f(**A**) *with* <sup>π</sup> · <sup>x</sup> <sup>=</sup> <sup>x</sup> *we have* <sup>π</sup>(a) = <sup>a</sup> *for all* <sup>a</sup> <sup>∈</sup> supp(x)*.*

Intuitively, the support of an element is the set of free literals. An equivariant map can forget some of the support of an element, but can never introduce new atoms, i.e. supp(f(x)) <sup>⊆</sup> supp(x). The intuition behind strong nominal sets is that all atoms appear in a fixed order, that is, **<sup>A</sup>**<sup>n</sup> is strong, but <sup>P</sup>f(**A**) (the finite powerset) is not. We set S to be the class of strong nominal sets:

*Example 4.5.* The Nom-functor of unordered pairs admits precise factorizations w.r.t. strong nominal sets, but not w.r.t. all nominal sets.

In the application, we fix the set I <sup>=</sup> **<sup>A</sup>**#<sup>n</sup> of distinct <sup>n</sup>-tuples of atoms (<sup>n</sup> <sup>≥</sup> 0) as the pointing. The hom-sets Nom(X,Pufs<sup>Y</sup> ) are ordered point-wise.

**Proposition 4.6.** *Uniformly finitely supported powerset* <sup>P</sup>ufs(X) = {<sup>Y</sup> <sup>⊆</sup> <sup>X</sup> <sup>|</sup> <sup>y</sup>∈<sup>Y</sup> supp(y) *finite*} *satisfies* (Ax2-5) *w.r.t.* <sup>S</sup> *the class of strong nominal sets.*<sup>1</sup>

As for F, we study an LTS-like functor, extended with the *binding functor* [10]:

**Definition 4.7.** *For a nominal set* <sup>X</sup>*, define the* <sup>α</sup>*-equivalence relation* <sup>∼</sup><sup>α</sup> *on* **<sup>A</sup>** <sup>×</sup> <sup>X</sup> *by:* (a, x) <sup>∼</sup><sup>α</sup> (b, y) ⇔ ∃<sup>c</sup> <sup>∈</sup> **<sup>A</sup>** \ supp(x) \ supp(y) *with* (a c) · <sup>x</sup> = (b c) · y. *Denote the quotient by* [**A**]X := **<sup>A</sup>** <sup>×</sup> X/∼α*. The assignment* X → [**A**]X *extends to a functor, called the* binding functor [**A**]: Nom −→Nom*.*

RNNA are precisely <sup>P</sup>ufsF-coalgebras for F X <sup>=</sup> {} + [**A**]<sup>X</sup> <sup>+</sup> **<sup>A</sup>** <sup>×</sup> <sup>X</sup> [26]. In this paper we additionally consider initial states for RNNAs.

**Proposition 4.8.** *The binding functor* [**A**] *admits precise factorizations w.r.t. strong nominal sets and so does* F X <sup>=</sup> {} + [**A**]X <sup>+</sup> **<sup>A</sup>** <sup>×</sup> X*.*

An element in PathOrd(**A**#<sup>n</sup>, F) may be regarded as a word with binders under a context *<sup>a</sup>* w, where *<sup>a</sup>* <sup>∈</sup> **<sup>A</sup>**#<sup>n</sup>, all literals in <sup>w</sup> are bound or in *<sup>a</sup>*, and <sup>w</sup> may end with . Moreover, two word-in-contexts *<sup>a</sup>* w and *<sup>a</sup>* <sup>w</sup> are identified if their closures are <sup>α</sup>-equivalent, that is, <sup>|</sup><sup>a</sup><sup>1</sup> ···|<sup>a</sup>*<sup>n</sup>* <sup>w</sup> <sup>=</sup> <sup>|</sup><sup>a</sup>- <sup>1</sup> ···|<sup>a</sup>- *n* w . The trace semantics of a RNNA T contains all the word-in-contexts corresponding to runs in T. This trace semantics distinguishes whether words are concluded by .

#### **4.3 Subsuming Arbitrary Open Morphism Situations**

Lasota [19] provides a translation of a small path-category **<sup>P</sup>** <sup>→</sup> **<sup>M</sup>** into a functor **<sup>F</sup>**: Set**obj <sup>P</sup>** −→Set**obj <sup>P</sup>** defined by **<sup>F</sup>** XP <sup>P</sup> = ( Q∈**P** <sup>P</sup>(X<sup>Q</sup>))**<sup>P</sup>**(P,Q) <sup>P</sup> <sup>∈</sup>**<sup>P</sup>**.

<sup>1</sup> There are two variants of powersets discussed in [26]. The finite powerset <sup>P</sup><sup>f</sup> also fulfils the axioms. However, *finitely supported* powerset <sup>P</sup>fs does not fulfil (Ax5).

So the hom-sets Set**obj <sup>P</sup>**(X, **<sup>F</sup>**Y ) have a canonical order, namely the point-wise inclusion. This admits a functor Beh from **M** to **F**-coalgebras and lax coalgebra homomorphisms, and Lasota shows that f <sup>∈</sup> **<sup>M</sup>**(X, Y ) is **<sup>P</sup>**-open iff Beh(f) is a coalgebra homomorphism. In the following, we show that we can apply our framework to **<sup>F</sup>** by a suitable decomposition **<sup>F</sup>** <sup>=</sup> T F and a suitable object I for the initial state pointing. As usual in open map papers, we require that **P** and **<sup>M</sup>** have a common initial object 0**P**. Observe that we have **<sup>F</sup>** <sup>=</sup> <sup>T</sup> · <sup>F</sup> where

$$T(X\_P)\_{P \in \mathbb{P}} = \left(\mathcal{P}(X\_P)\right)\_{P \in \mathbb{P}} \quad \text{and} \quad F(X\_P)\_{P \in \mathbb{P}} = \left(\coprod\_{Q \in \mathbb{P}} \mathbb{P}(P, Q) \times X\_Q\right)\_{P \in \mathbb{P}}.$$

Lasota considers coalgebras without pointing, but one indeed has a canonical pointing as follows. For P <sup>∈</sup> **<sup>P</sup>**, define the characteristic family χ<sup>P</sup> <sup>∈</sup> Set**obj <sup>P</sup>** by χP <sup>Q</sup> = 1 if <sup>P</sup> <sup>=</sup> <sup>Q</sup> and <sup>χ</sup><sup>P</sup> <sup>Q</sup> <sup>=</sup> <sup>∅</sup> if <sup>P</sup> <sup>=</sup> <sup>Q</sup>. With this, we fix the pointing <sup>I</sup> <sup>=</sup> <sup>χ</sup>0**<sup>P</sup>** .

**Proposition 4.9.** T*,* F *and* I *satisfy the axioms from Table 2, with* <sup>S</sup> <sup>=</sup> Set**obj <sup>P</sup>***.*

The path category in Coalgl(I,TF) from our theory can be described as follows.

**Proposition 4.10.** *An object of* Path(I,F) *is a sequence of composable* **<sup>P</sup>***-morphisms* 0**<sup>P</sup>** <sup>m</sup><sup>1</sup> −−→ <sup>P</sup><sup>1</sup> <sup>m</sup><sup>2</sup> −−→ <sup>P</sup><sup>2</sup> ··· <sup>m</sup>*<sup>n</sup>* −−→ <sup>P</sup><sup>n</sup>*.*

# **5 Conclusions and Further Work**

We proved that coalgebra homomorphisms for systems with non-deterministic branching can be seen as open maps for a canonical path-category, constructed from the computation type F. This limitation to non-deterministic systems is unsurprising: as we have proved in Sect. 4.3 on Lasota's work [19], every open map situation can been encoded as a coalgebra situation with a powerset-like functor, so with non-deterministic branching. As a future work, we would like to extend this theory of path-categories to coalgebras for further kinds of branching, especially probabilistic and weighted. This will require (1) to adapt open maps to allow those kinds of branching (2) adapt the axioms from Table 2, by replacing the "+1" part of (Ax1) to something depending on the branching type.

# **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Author Index

Akshay, S. 260 Alcolei, Aurore 27 Alvarez-Picallo, Mario 45

Baier, Christel 436 Barlocco, Simone 62 Ben-Amram, Amir M. 80 Biernacki, Dariusz 98 Bollig, Benedikt 115 Bouyer, Patricia 115 Cadilhac, Michaël 133 Castellan, Simon 150 Clairambault, Pierre 27 Colcombet, Thomas 1 Corradini, Andrea 169 Dartois, Luc 189 Doumane, Amina 207 Dubut, Jérémy 224, 523 Echenim, Mnacho 242 Fijalkow, Nathanaël 1 Filiot, Emmanuel 189 Gupta, Utkarsh 260 Hamilton, Geoff W. 80 Hasuo, Ichiro 523 Hausmann, Daniel 277 Heindel, Tobias 169 Hofman, Piotr 260 Hugunin, Jasper 295 Iosif, Radu 242 Jacobs, Bart 313 Katsumata, Shin-ya 523 Kerjean, Marie 330 Kissinger, Aleks 313 König, Barbara 169

Kuperberg, Denis 207 Kupke, Clemens 62 Kuske, Dietrich 348

Laurent, Olivier 27 Lenglet, Sergueï 98 Leventis, Thomas 365 Lucas, Christophe 418

Maneth, Sebastian 488 Matache, Cristina 382 Milius, Stefan 400 Mio, Matteo 418

Nolte, Dennis 169

Ong, C.-H. Luke 45

Pacaud Lemay, Jean-Simon 330 Pagani, Michele 365 Palenta, Raphaela 488 Peltier, Nicolas 242 Pérez, Guillermo A. 133 Piribauer, Jakob 436 Piróg, Maciej 453 Polesiuk, Piotr 98, 453 Pous, Damien 207 Pradic, Pierre 207, 470

Reiter, Fabian 115 Rensink, Arend 169 Riba, Colin 470 Rot, Jurriaan 62

Schröder, Lutz 277 Seidl, Helmut 488 Shah, Preey 260 Sieczkowski, Filip 453 Staton, Sam 382

Talbot, Jean-Marc 189

Urbat, Henning 400

van den Bogaard, Marie 133 van Glabbeek, Rob 505

Wißmann, Thorsten 523

Yoshida, Nobuko 150

Zanasi, Fabio 313 Zetzsche, Georg 348